Jellyfish Technologies Logo

RAG Variants Explained: Classic RAG, Graph RAG, HyDE, and RAGFusion

rag-variants

Retrieval-Augmented Generation (RAG) has become the go-to technique for making large language models (LLMs) useful in real-world scenarios — especially when working with custom or domain-specific knowledge.

But RAG is no longer just one thing.

Over time, researchers and practitioners have developed multiple RAG variants, each with its own strengths and ideal use cases. In this blog, we’ll break down the most popular types of RAG:

  • Classic RAG
  • Graph RAG
  • HyDE (Hypothetical Document Embeddings)
  • Multi-hop RAG
  • RAGFusion

1. Classic RAG

This is the standard setup where a user question is used to retrieve relevant documents from a vector store. These docs are then passed to an LLM for answer generation.

How it works:

  1. Embed documents and store in a vector DB (like FAISS, Qdrant, Weaviate)
  2. Embed the query
  3. Retrieve top-K documents
  4. Feed context + question to LLM

Code Example (LangChain):

from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter

# Load and split documents
loader = TextLoader("data/legal_corpus.txt")
docs = loader.load()
splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)

# Create vector DB
embeddings = HuggingFaceEmbeddings()
vectorstore = FAISS.from_documents(chunks, embeddings)

# Create retrieval-augmented QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=OpenAI(),
    retriever=vectorstore.as_retriever()
)

# Run
response = qa_chain.run("What is the penalty for breach of contract?")
print(response)

2. Graph RAG

Graph RAG adds structure and memory to standard RAG. Instead of one-shot retrieval and generation, you create a graph where each node represents a topic/domain/task. Nodes can:

  • Hold their own retriever
  • Use schema-aware prompts
  • Interact with each other for sub-questions or missing fields

When to use:

  • Multi-domain knowledge extraction (legal, medical, finance)
  • Schema-based output using Pydantic
  • Smart retries and follow-up questioning

Code Snippet :

class GraphNode:
    def __init__(self, retriever, prompt_template, schema):
        self.retriever = retriever
        self.prompt_template = prompt_template
        self.schema = schema

    def run(self, query):
        context = self.retriever.retrieve(query)
        prompt = self.prompt_template.format(query=query, context=context)
        response = query_llm(prompt)
        return self.schema.parse_obj(response)

# Usage
medical_node = GraphNode(medical_retriever, medical_prompt, MedicalEntitySchema)
legal_node = GraphNode(legal_retriever, legal_prompt, LegalEntitySchema)

result = legal_node.run("Extract contract duration and penalty terms")

3. HyDE (Hypothetical Document Embeddings)

HyDE takes a different approach. Instead of retrieving based on the raw query, it generates a hypothetical answer first, then embeds that hypothetical response to retrieve documents.

This often leads to more relevant retrieval because the generated content is semantically richer than the raw question.

When to use:

  • Sparse data settings
  • Poorly structured corpora
  • Cases where queries are ambiguous

Code Snippet:

from transformers import pipeline, AutoModel, AutoTokenizer
from sentence_transformers import SentenceTransformer
import faiss

# Step 1: Generate hypothetical answer
llm = pipeline("text-generation", model="gpt2")
query = "What is Section 420 IPC?"
hypothetical = llm(f"Answer the following: {query}", max_length=100)[0]['generated_text']

# Step 2: Embed hypothetical answer
embedder = SentenceTransformer("all-MiniLM-L6-v2")
query_embedding = embedder.encode(hypothetical)

# Step 3: Retrieve using FAISS
index = faiss.read_index("legal_faiss.index")
D, I = index.search([query_embedding], k=5)

4. Multi-Hop RAG

Multi-hop RAG chains together multiple retrieval + generation steps. It mimics how humans reason: answering one sub-question at a time.

When to use:

  • Complex QA that needs reasoning across multiple chunks
  • Multi-document fact synthesis

High-Level Flow:

  1. Decompose query into sub-questions
  2. Retrieve answers to each
  3. Synthesize final answer

Example Prompt Chain:

from langchain.chains import LLMChain, SequentialChain
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# Step 1: Break question into parts
breakdown_prompt = PromptTemplate.from_template("Break this down: {question}")
breakdown_chain = LLMChain(llm=OpenAI(), prompt=breakdown_prompt)

# Step 2: Answer each part (use RetrievalQA inside loop)
# Step 3: Combine into final answer
combine_prompt = PromptTemplate.from_template("Summarize: {points}")
combine_chain = LLMChain(llm=OpenAI(), prompt=combine_prompt)

# Final chain
qa_chain = SequentialChain(
    chains=[breakdown_chain, combine_chain],
    input_variables=["question"],
    output_variables=["points"]
)

5. RAGFusion

RAGFusion is an advanced ensemble-based variant where multiple retrievers are queried in parallel (e.g., using different embedding models or retrieval strategies). The top results are then fused together, often using rank aggregation, before passing to the LLM.

This improves robustness and reduces the risk of missing critical context due to a single retriever’s limitations.

When to use:

  • When you have multiple embedding models or retrievers
  • Want to increase recall while maintaining relevance
  • Ideal for hybrid search scenarios (dense + sparse retrieval)

Code Snippet:

from langchain.retrievers import EnsembleRetriever
from langchain.vectorstores import FAISS, Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Initialize two different retrievers
retriever1 = FAISS.load_local("index1", embedding=HuggingFaceEmbeddings()).as_retriever()
retriever2 = Chroma(persist_directory="./chroma_legal").as_retriever()

# Create RAGFusion-style ensemble retriever
ensemble = EnsembleRetriever(retrievers=[retriever1, retriever2], weights=[0.5, 0.5])

# Use in a QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=OpenAI(),
    retriever=ensemble,
    return_source_documents=True
)

response = qa_chain.run("What are the remedies for breach of contract in India?")
print(response)

Conclusion

RAG VariantBest For
Classic RAGFast answers from a known corpus
Graph RAGStructured, multi-domain schema extraction
HyDEImproving retrieval quality in sparse/ambiguous settings
Multi-hop RAGComplex reasoning over multiple facts/documents
RAGFusionCombining strengths of multiple retrievers

As RAG systems evolve, choosing the right one depends entirely on your data, use case, and performance expectations. Mix and match these methods — or combine them for even better results.

Share this article
Want to speak with our solution experts?
Jellyfish Technologies

Modernize Legacy System With AI: A Strategy for CEOs

Download the eBook and get insights on CEOs growth strategy

    Let's Talk

    We believe in solving complex business challenges of the converging world, by using cutting-edge technologies.