How to Convert Any Text Into a Graph of Concepts

Author:Murphy  |  View: 20778  |  Time: 2025-03-23 12:12:27
Image generated by the author using the project shared in this article.

A few months ago, knowledge-based QnA (KBQA) was a novelty. Now KBQA with Retrieval Augmented Generation (RAG) is a piece of cake for any AI enthusiast. It's fascinating to see how the realm of possibilities in NLP has expanded so rapidly due to LLMs. And it's getting better by the day.

In my last article, I shared a recursive RAG approach to implement QnA with multi-hop reasoning to answer complex queries based on a large corpus of text.

The Research Agent: Addressing the Challenge of Answering Questions Based on a Large Text Corpus

A good number of folks tried it out and sent their feedback. Thanks all for your feedback. I have since collated these contributions and made a few improvements to the code to address some of the problems with the original implementation. I plan to write a separate article about it.

In this article, I want to share another idea that may help create a super research agent when combined with recursive RAG. The idea emerged out of my experiments with recursive RAG with smaller LLMs, and a few other ideas that I read on Medium – specifically one, the Knowledge-Graph Augmented Generation.

Abstract

A Knowledge Graphs (KG), or any Graph, is made up of Nodes and Edges. Each node of the KG represents a concept and each edge is a relationship between a pair of such concepts. In this article, I will share a method to convert any text corpus into a Graph of Concepts. I am using the term ‘Graph of Concept' (GC) interchangeably with the terms KG to better describe what I am demoing here.

All the components I used in this implementation can be set up locally, so this project can be run easily on a personal machine. I have adopted a no-GPT approach here because I believe in smaller open source models. I am using the fantastic Mistral 7B Openorca instruct and Zephyr models. These models can be set up locally with Ollama.

Databases like Neo4j make it easy to store and retrieve graph data. Here I am using in-memory Pandas Dataframes and the NetworkX Python library, to keep things simple.

Our goal here is to convert any text corpus into a Graph of Concepts (GC) and visualise it like the beautiful banner image of this article. We will even interact with the network graph by moving nodes and edges, zooming in and out, and change the physics of the graph to our heart's desire. Here is the Github page link that shows the result of what we are building.

https://rahulnyk.github.io/knowledge_graph/

But first, let's delve into the fundamental idea of KGs and why we need them. If you are familiar with this concept already, feel free to skip the next section.


Knowledge Graph

Consider the following text.

Mary had a little lamb, You've heard this tale before; But did you know she passed her plate, And had a little more!

(I hope the kids are not reading this

Tags: Hands On Tutorials Knowledge Graph Large Language Models NLP Open Source

Comment