Jellyfish Technologies Logo

Building a GraphRAG Pipeline Using Neo4j and LangChain

Building a GraphRAG Pipeline Using Neo4j and LangChain

Retrieval-Augmented Generation (RAG) has become a widely used approach for enabling large language models (LLMs) to access external information in real time. It works by retrieving relevant documents or chunks from a vector database and using them as context for generating answers.

However, traditional RAG systems often treat information in a flat structure. While they are effective at finding semantically similar content, they can miss the deeper relationships between entities and concepts. This is where GraphRAG offers a significant advantage—by integrating knowledge graphs with generative models.

The following guide outlines a practical GraphRAG pipeline using Neo4j for graph storage and LangChain for connecting to LLMs. The focus is on blending structured and unstructured data to generate more connected, contextual responses.

What is GraphRAG?

GraphRAG is an enhanced version of RAG that leverages a knowledge graph—a structured representation of entities and their relationships—to supplement the context provided to an LLM.

For example:

  • Traditional RAG might retrieve: “John Doe worked at Company X.”
  • GraphRAG can combine: “John Doe → employed_by → Company X → litigated_by → Court Y.”

This structured context allows for better multi-hop reasoning and grounded answers.

Technology Stack

  • Neo4j: Stores and manages the knowledge graph.
  • LangChain: Manages the retrieval process and communicates with the LLM.
  • OpenAI/llama models: Generates final answers.
  • FAISS (optional): For additional semantic vector retrieval.

Step 1: Set Up Neo4j

Start by launching a local Neo4j instance (or use Neo4j Aura). Then install the Neo4j Python driver:

pip install neo4j

Step 2: Build the Graph

Assume a legal text dataset is available. Using entity and relation extraction (e.g., with SpaCy or an LLM), relationships between entities can be created in Neo4j.

from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

def create_graph(tx, head, relation, tail):
    tx.run("""
        MERGE (a:Entity {name: $head})
        MERGE (b:Entity {name: $tail})
        MERGE (a)-[:RELATION {type: $relation}]->(b)
    """, head=head, tail=tail, relation=relation)

with driver.session() as session:
    session.write_transaction(create_graph, "Company X", "employs", "John Doe")
    session.write_transaction(create_graph, "John Doe", "sued_by", "Court Y")

This builds a graph like:

Company X → employs → John Doe → sued_by → Court Y

Step 3: Query the Graph

Define a Cypher query to fetch relationships relevant to the user query.

def query_graph(term):
    with driver.session() as session:
        result = session.run("""
            MATCH (a:Entity)-[r:RELATION]->(b:Entity)
            WHERE a.name CONTAINS $term OR b.name CONTAINS $term
            RETURN a.name, type(r), b.name
        """, term=term)

        return [f"{record['a.name']} --{{record['type(r)']}}--> {{record['b.name']}}" for record in result]

This retrieves entity relationships that are relevant to the input term.

Step 4: Use the Graph as LLM Context

LangChain can be used to pass the graph facts to an LLM.

from langchain.chat_models import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain.chains import LLMChain

llm = ChatOpenAI()

prompt_template = PromptTemplate.from_template("""
Use the following facts extracted from a knowledge graph:
{facts}

Answer this question:
{question}
""")

def generate_answer(question):
    facts = query_graph(question)
    context = "\n".join(facts)
    chain = LLMChain(llm=llm, prompt=prompt_template)
    return chain.run(facts=context, question=question)

print(generate_answer("What is the relationship between John Doe and Company X?"))

Example output:

John Doe is employed by Company X and is involved in a legal dispute with Court Y.

Bonus: Add Hybrid Retrieval

To increase context coverage, combine graph results with semantic retrieval using FAISS:

from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

faiss_index = FAISS.from_documents(doc_chunks, OpenAIEmbeddings())

def hybrid_context(question):
    graph_facts = query_graph(question)
    semantic_docs = faiss_index.similarity_search(question, k=3)
    semantic_facts = [doc.page_content for doc in semantic_docs]
    return "\n".join(graph_facts + semantic_facts)

This approach blends symbolic relationships with dense retrieval.

Challenges and Considerations

Graph Syncing

Updating the graph continuously as documents change is non-trivial and requires automation.

Reasoning Over Multiple Hops

Complex multi-hop traversal in Cypher can be powerful but requires thoughtful query design.

Performance

Large graphs need filtering, ranking, or compression to reduce noise and improve speed.

Summary

GraphRAG with Neo4j enables LLMs to respond with richer, more connected answers. It’s especially useful in domains like law, healthcare, and enterprise knowledge where entity relationships matter.

.

Share this article
Want to speak with our solution experts?
Jellyfish Technologies

Modernize Legacy System With AI: A Strategy for CEOs

Download the eBook and get insights on CEOs growth strategy

    Let's Talk

    We believe in solving complex business challenges of the converging world, by using cutting-edge technologies.