RRF vs DBSF in Qdrant: Hybrid Search Examples
🧠 RRF vs DBSF in Qdrant: Python, E-Commerce, Real Examples
Intro: Hybrid search combines different retrieval signals (e.g., semantic embeddings and text matches) to deliver better relevance. Qdrant supports two fusion methods: RRF and DBSF. ([qdrant.tech][1])
📌 What Are RRF and DBSF?
🔹 Reciprocal Rank Fusion (RRF)
RRF ranks results from multiple methods (e.g., dense & sparse vectors) by their positions in each ranked list. The score is a sum of reciprocal rank contributions. This helps items that appear high in multiple rankings rise to the top. ([qdrant.tech][1])
Score formula (simplified):
score = Σ 1 / (k + rank)
Where rank is position in each query result, and k is a constant (often 2 in Qdrant). ([qdrant.tech][1])
🔹 Distribution-Based Score Fusion (DBSF)
DBSF looks at the score distributions from each method, normalizes them with mean±standard deviation, and sums normalized scores. It’s more sensitive to score value differences than just rank. ([qdrant.tech][2])
📊 Why It Matters in E-Commerce Search
Imagine an online shop where users search products:
- Dense vector search captures semantic meaning (e.g., “comfortable running shoes”).
- Sparse text matches catch exact keywords (e.g., “size 10 nike”).
Hybrid fusion boosts results that are semantically and textually relevant — ideal for product discovery and conversion.
🧪 Example Dataset: E-Commerce Products CSV
Assume a CSV named products.csv:
| id | title | description | price | category |
|---|---|---|---|---|
| 1 | Running Shoes | Lightweight running shoes | 89.99 | Sportswear |
| 2 | Trail Sneakers | Off-road running shoes | 99.99 | Outdoors |
| … | … | … | … | … |
⚙️ Step 1: Load & Embed Data (Python)
import pandas as pd from qdrant_client import QdrantClient, models from sentence_transformers import SentenceTransformer # Load products df = pd.read_csv("products.csv") # Embedding models dense_model = SentenceTransformer("all-MiniLM-L6-v2") sparse_model = SentenceTransformer("BM25-sim") # Example sparse model
📥 Step 2: Initialize Qdrant
client = QdrantClient(url="http://localhost:6333") client.recreate_collection( collection_name="products", vectors=models.VectorParams(size=384, distance=models.Distance.COSINE), )
📌 Step 3: Ingest Data (Dense + Sparse)
points = [] for idx, row in df.iterrows(): dense_vec = dense_model.encode(row["description"]).tolist() sparse_vec = sparse_model.encode(row["title"]).tolist() points.append( models.PointStruct( id=idx, vector=dense_vec, payload={ "title": row["title"], "description": row["description"], "sparse": sparse_vec, }, ) ) client.upsert(collection_name="products", points=points)
In Qdrant you can treat the
sparsevector as another field that acts like text-match embedding.
🔍 Step 4: Hybrid Query with RRF
from qdrant_client import models query_text = "best running shoes" query_dense = dense_model.encode(query_text).tolist() query_sparse = sparse_model.encode(query_text).tolist() response_rrf = client.query_points( collection_name="products", prefetch=[ models.Prefetch(query=query_sparse, using="sparse", limit=20), models.Prefetch(query=query_dense, using="dense", limit=20), ], query=models.FusionQuery(fusion=models.Fusion.RRF), limit=10, ) for p in response_rrf: print(p.payload["title"], p.score)
This fetches hybrid results ranked with RRF. ([qdrant.tech][1])
🔍 Step 5: Hybrid Query with DBSF
response_dbsf = client.query_points( collection_name="products", prefetch=[ models.Prefetch(query=query_sparse, using="sparse", limit=20), models.Prefetch(query=query_dense, using="dense", limit=20), ], query=models.FusionQuery(fusion=models.Fusion.DBSF), limit=10, ) for p in response_dbsf: print(p.payload["title"], p.score)
Now scoring uses distribution normalization, which can rank different product matches more subtly. ([qdrant.tech][2])
📈 Comparison: RRF vs DBSF
| Feature | RRF | DBSF |
|---|---|---|
| Combines by rank | ✔️ | ❌ |
| Combines by normalized scores | ❌ | ✔️ |
| Sensitive to score magnitude | Low | High |
| Good for many overlapping hits | ✔️ | ✔️ |
| Better for score variance across retrievers | ❌ | ✔️ |
🖼️ Visual Examples (Explaining the Fusion)
RRF Fusion Process
Combining multiple ranking lists into a final order.Scoring Distribution (DBSF Concept)
DBSF normalizes scores based on distribution.
🧠 When to Use Which?
- Use RRF if you trust consistent rank alignment across methods (e.g., dense & sparse both agree).
- Use DBSF when score magnitudes vary greatly, and you want tighter normalization between dense and sparse scores.
📚 Resources
- 📘 Qdrant Hybrid Queries Docs (RRF & DBSF) — https://qdrant.tech/documentation/concepts/hybrid-queries/ ([qdrant.tech][1])
- 🧠 Qdrant Hybrid Search Tutorial — https://qdrant.tech/articles/hybrid-search/ ([qdrant.tech][3])
- 📊 Reciprocal Rank Fusion Concept — backed by search research & hybrid systems ([paradedb.com][4])
Conclusion: RRF and DBSF each offer unique ways to fuse hybrid signals. For most e-commerce hybrid search needs, DBSF provides nuanced score combination, while RRF excels at rank consensus.
I hope this post was helpful to you.
Leave a reaction if you liked this post!