Running the STORM AI research system with your local documents

TL;DR
The use of LLM agents is becoming more common for tackling multi-step long-context research tasks where traditional RAG direct prompting methods can sometimes struggle. In this article, we will explore a new and promising technique developed by Stanford called Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking (STORM), which uses LLM agents to simulate ‘Perspective-guided conversations' to reach complex research goals and generate rich research articles that can be used by humans in their pre-writing research. STORM was initially developed to gather information from web sources but also supports searching a local document vector store. In this article we will see how to implement STORM for AI-supported research on local PDFs, using US FEMA disaster preparedness and assistance documentation.
It's been amazing to watch how using LLMs for knowledge retrieval has progressed in a relatively short period of time. Since the first paper on Retrieval Augmented Generation (RAG) in 2020, we have seen the ecosystem grow to include a cornucopia of available techniques. One of the more advanced is agentic RAG where LLM agents iterate and refine document retrieval in order to solve more complex research tasks. It's similar to how a human might carry out research, exploring a range of different search queries to build a better idea of the context, sometimes discussing the topic with other humans, and synthesizing everything into a final result. Single-turn RAG, even employing techniques such as query expansion and reranking, can struggle with more complex multi-hop research tasks like this.
There are quite a few patterns for knowledge retrieval using agent frameworks such as Autogen, CrewAI, and LangGraph as well as specific AI research assistants such as GPT Researcher. In this article, we will look at an LLM-powered research writing system from Stanford University, called Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking (STORM).
STORM AI research writing system
STORM applies a clever technique where LLM agents simulate ‘Perspective-guided conversations' to reach a research goal as well as extend ‘outline-driven RAG' for richer article generation.
Configured to generate Wikipedia-style articles, it was tested with a cohort of 10 experienced Wikipedia editors.

Reception on the whole was positive, 70% of the editors felt that it would be a useful tool in their pre-writing stage when researching a topic. I hope in the future surveys could include more than 10 editors, but it should be noted that authors also benchmarked traditional article generation methods using FreshWiki, a dataset of recent high-quality Wikipedia articles, where STORM was found to outperform previous techniques.

STORM is open source and available as a Python package with additional implementations using frameworks such as LangGraph. More recently STORM has been enhanced to support human-AI collaborative knowledge curation called Co-STORM, putting a human right in the center of the AI-assisted research loop.
Though it significantly outperforms baseline methods in both automatic and human evaluations, there are some caveats that the authors acknowledge. It isn't yet multimodal, doesn't produce experienced human-quality content – it isn't positioned yet for this I feel, being more targeted for pre-writing research than final articles – and there are some nuances around references that require some future work. That said, if you have a deep research task, it's worth checking out.
You can try out STORM online – it's fun! – configured to perform research using information on the web.
But what about running STORM with your own data?
Many organizations will want to use AI research tools with their own internal data. The STORM authors have done a nice job of documenting various approaches of using STORM with different LLM providers and a local vector database, which means it is possible to run STORM on your own documents.
So let's try this out!
Setup and code
You can find the code for this article here, which includes environment setup instructions and how to collate some sample documents for this demo.
FEMA disaster preparedness and assistance documentation
We will use 34 PDF documents to help people prepare for and respond to disasters, as created by the United States Federal Emergency Management Agency (FEMA). These documents perhaps aren't typically what people may want to use for writing deep research articles, but I'm interested in seeing how AI can help people prepare for disasters.
…. and I have the code already written for processing Fema reports from some earlier blog posts, which I've included in the linked repo above.