What is Similarity Scoring in RAG?
In Retrieval-Augmented Generation (RAG), similarity scoring refers to the method used to determine how relevant a chunk of text is to a user’s query. These scores guide the retrieval system in selecting the best passages to feed into a language model for response generation.
Choosing the right similarity scoring method is critical to performance, especially in domains like legal, medical, or research, where precision matters.
Types of Similarity Scoring Methods
1. Cosine Similarity (Dense Vectors)
- Based on the angle between two embedding vectors.
- Common in embedding-based vector search systems (Pinecone, FAISS, Weaviate).
from sklearn.metrics.pairwise import cosine_similarity
# Assuming query_emb and doc_emb are 1D vectors (embeddings)
similarity = cosine_similarity([query_emb], [doc_emb])
print("Cosine Similarity:", similarity[0][0])
Used in: OpenAI Embeddings + Pinecone, HuggingFace Transformers + FAISS
2. Dot Product
- Common with inner product-based indexes like FAISS’s IndexFlatIP
- Works well when embeddings are not normalized.
import numpy as np
# Assuming query_emb and doc_emb are 1D NumPy vectors
score = np.dot(query_emb, doc_emb)
print("Dot Product Similarity:", score)
Used in: Facebook DPR, some OpenAI + LangChain pipelines
3. L2 Distance (Euclidean)
- Measures straight-line distance between vectors
- Lower = more similar
from numpy.linalg import norm
# Assuming query_emb and doc_emb are NumPy vectors
# Using negative L2 norm for similarity scoring (higher = more similar)
score = -norm(query_emb - doc_emb)
print("L2 Distance Similarity:", score)
Used in: Custom RAG where vector distance is intuitive
4. BM25 (Sparse Text-Based Retrieval)
- Keyword-based relevance score
- Doesn’t require embeddings
from rank_bm25 import BM25Okapi
# Tokenize documents into lists of words
tokenized_corpus = [doc.split() for doc in documents]
# Initialize BM25 with tokenized corpus
bm25 = BM25Okapi(tokenized_corpus)
# Tokenize the query
tokenized_query = query.split()
# Get BM25 scores for the query against the documents
scores = bm25.get_scores(tokenized_query)
print("BM25 Scores:", scores)
Used in: Elasticsearch, Whoosh, Haystack, hybrid RAG setups
5. Cross-Encoder Scoring (Reranking)
- Pass both query and document into a transformer and score them jointly.
from sentence_transformers import CrossEncoder
# Load the pre-trained cross-encoder model
model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
# Example query-document pair
query = "What is the capital of France?"
doc = "Paris is the capital and most populous city of France."
# Predict similarity score
score = model.predict([(query, doc)])
print("Relevance Score:", score)
Used in: RAG + Reranker pipelines (ColBERT, LangChain rerankers, Vectara)
Integrating into a RAG Pipeline
from sentence_transformers import CrossEncoder
# Step 1: Create Retriever from Pinecone Index
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 5} # Retrieve top 5 candidates
)
# Step 2: Get Relevant Documents
docs = retriever.get_relevant_documents(query)
# Step 3: Rerank Using Cross-Encoder
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
reranked = sorted(
docs,
key=lambda d: cross_encoder.predict([(query, d.page_content)]),
reverse=True
)
# Top result
print("Top Document:", reranked[0].page_content)
Which to Use?
Method | Speed | Accuracy | Use Case |
---|---|---|---|
Cosine | ✅ Fast | ✅ Moderate | General RAG, scalable, default choice |
Dot Product | ✅ Fast | ✅ Good | Internal usage, dense-only setups |
L2 Distance | ❌ Slow | ✅ Good | Custom distance-based tuning |
BM25 | ✅ Fast | ✅ Decent | Low-latency, keyword-heavy domains |
Cross-Encoder | ❌ Slower | ⭐ High | High-accuracy reranking, short docs |
Summary
Similarity scoring is the foundation of good retrieval in RAG systems. Choosing the right method depends on your:
- Latency tolerance
- Accuracy needs
- Document length and structure
- Hardware (GPU for reranking)
Hybrid approaches (BM25 + Cosine + Reranker) are increasingly common and recommended.
Planning to develop an AI software application? We’d be delighted to assist. Connect with Jellyfish Technologies to explore tailored, innovative solutions.