Jellyfish Technologies Logo

Solving Hallucinations in RAG System

Solving Hallucinations in RAG System

Retrieval-Augmented Generation (RAG) is a powerful framework that brings together large language models (LLMs) and external knowledge to create more accurate, context-aware answers. But even with RAG, developers often encounter one frustrating problem: hallucinations — answers that sound plausible but are completely wrong or made up.

In this blog, we’re going beyond installation and setup. We’ll walk you through a real-world case where a RAG system was hallucinating facts and how we debugged and resolved it.

The Problem: “My RAG System Is Still Hallucinating!”

Imagine this scenario:

You provide your chatbot with a comprehensive dataset of your company’s product inventory. You then ask:​

“Do we have any eco-friendly office chairs in stock?”

The chatbot responds with a detailed description of an “EcoComfort ErgoChair,” highlighting its sustainable materials and ergonomic design. However, upon checking your inventory, you realize that such a product doesn’t exist in your catalog.​

Despite integrating FAISS, OpenAI embeddings, and LangChain into your Retrieval-Augmented Generation (RAG) system, you’re still receiving fabricated responses—commonly known as “hallucinations.”

Diagnosis: Why Was Our RAG System Hallucinating?

RAG Setup Recap:

  • Document: A 25-page legal contract
  • Vector Store: FAISS
  • Chunking: 1000 tokens, no overlap
  • LLM: GPT-3.5 via LangChain’s RetrievalQA

Result: The model sounded smart but often made things up, especially when asked “why,” “what if,” or “compare” questions.

Root Causes Identified

 1. Chunking Strategy Was Too Naive

  • Long paragraphs were being chopped mid-sentence.
  • Questions lost their context, and LLM filled the gaps.

Fix: Switch to RecursiveCharacterTextSplitter, which splits text by recursively looking at characters.

2. Retrieval Was Returning Irrelevant Chunks

  • The FAISS retriever sometimes returned chunks with high similarity to question keywords but not semantic meaning.

Fix: Used MMR (Maximal Marginal Relevance) retriever to diversify retrieved content:

This increased answer accuracy by over 30% in our internal tests.

3. The Prompt Was Too Generic

  • LangChain’s default prompt didn’t ground the LLM enough in the context.

Fix: Customized the prompt to explicitly instruct the model to “only use information from the provided context.”

Result After Fixes

MetricBeforeAfter
AccuracyMediumHigh
HallucinationsFrequentRare
User TrustLowHigh
LatencySlightly increasedAcceptable

Other Tips to Prevent Hallucination

Reduce number of retrieved docs: Sometimes fewer, more relevant chunks work better than many.

Use re-rankers like Cohere/ColBERT after retrieval.

Add “grounding confidence” to UI — show the user where the answer came from.

● Use stuff chain_type only for very short docs. Prefer map_reduce for longer ones.

Summary

Takeaway

RAG systems don’t automatically guarantee factual answers. The way you chunk, retrieve, and prompt determines the outcome. By identifying where hallucinations creep in and applying surgical fixes, we transformed a shaky RAG prototype into a reliable AI assistant.

Share this article
Want to speak with our solution experts?
Jellyfish Technologies

Modernize Legacy System With AI: A Strategy for CEOs

Download the eBook and get insights on CEOs growth strategy

    Let's Talk

    We believe in solving complex business challenges of the converging world, by using cutting-edge technologies.