RRF vs DBSF in Qdrant: Hybrid Search Examples

🧠 RRF vs DBSF in Qdrant: Python, E-Commerce, Real Examples

Intro: Hybrid search combines different retrieval signals (e.g., semantic embeddings and text matches) to deliver better relevance. Qdrant supports two fusion methods: RRF and DBSF. ([qdrant.tech][1])

📌 What Are RRF and DBSF?

🔹 Reciprocal Rank Fusion (RRF)

RRF ranks results from multiple methods (e.g., dense & sparse vectors) by their positions in each ranked list. The score is a sum of reciprocal rank contributions. This helps items that appear high in multiple rankings rise to the top. ([qdrant.tech][1])

Score formula (simplified):

score = Σ 1 / (k + rank)

Where rank is position in each query result, and k is a constant (often 2 in Qdrant). ([qdrant.tech][1])

🔹 Distribution-Based Score Fusion (DBSF)

DBSF looks at the score distributions from each method, normalizes them with mean±standard deviation, and sums normalized scores. It’s more sensitive to score value differences than just rank. ([qdrant.tech][2])

📊 Why It Matters in E-Commerce Search

Imagine an online shop where users search products:

Dense vector search captures semantic meaning (e.g., “comfortable running shoes”).
Sparse text matches catch exact keywords (e.g., “size 10 nike”).

Hybrid fusion boosts results that are semantically and textually relevant — ideal for product discovery and conversion.

🧪 Example Dataset: E-Commerce Products CSV

Assume a CSV named products.csv:

id	title	description	price	category
1	Running Shoes	Lightweight running shoes	89.99	Sportswear
2	Trail Sneakers	Off-road running shoes	99.99	Outdoors
…	…	…	…	…

⚙️ Step 1: Load & Embed Data (Python)

import pandas as pd
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer

# Load products
df = pd.read_csv("products.csv")

# Embedding models
dense_model = SentenceTransformer("all-MiniLM-L6-v2")
sparse_model = SentenceTransformer("BM25-sim")  # Example sparse model

📥 Step 2: Initialize Qdrant

client = QdrantClient(url="http://localhost:6333")

client.recreate_collection(
    collection_name="products",
    vectors=models.VectorParams(size=384, distance=models.Distance.COSINE),
)

📌 Step 3: Ingest Data (Dense + Sparse)

points = []
for idx, row in df.iterrows():
    dense_vec = dense_model.encode(row["description"]).tolist()
    sparse_vec = sparse_model.encode(row["title"]).tolist()

    points.append(
        models.PointStruct(
            id=idx,
            vector=dense_vec,
            payload={
                "title": row["title"],
                "description": row["description"],
                "sparse": sparse_vec,
            },
        )
    )

client.upsert(collection_name="products", points=points)

In Qdrant you can treat the sparse vector as another field that acts like text-match embedding.

🔍 Step 4: Hybrid Query with RRF

from qdrant_client import models

query_text = "best running shoes"
query_dense = dense_model.encode(query_text).tolist()
query_sparse = sparse_model.encode(query_text).tolist()

response_rrf = client.query_points(
    collection_name="products",
    prefetch=[
        models.Prefetch(query=query_sparse, using="sparse", limit=20),
        models.Prefetch(query=query_dense, using="dense", limit=20),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=10,
)

for p in response_rrf:
    print(p.payload["title"], p.score)

This fetches hybrid results ranked with RRF. ([qdrant.tech][1])

🔍 Step 5: Hybrid Query with DBSF

response_dbsf = client.query_points(
    collection_name="products",
    prefetch=[
        models.Prefetch(query=query_sparse, using="sparse", limit=20),
        models.Prefetch(query=query_dense, using="dense", limit=20),
    ],
    query=models.FusionQuery(fusion=models.Fusion.DBSF),
    limit=10,
)

for p in response_dbsf:
    print(p.payload["title"], p.score)

Now scoring uses distribution normalization, which can rank different product matches more subtly. ([qdrant.tech][2])

📈 Comparison: RRF vs DBSF

Feature	RRF	DBSF
Combines by rank	✔️	❌
Combines by normalized scores	❌	✔️
Sensitive to score magnitude	Low	High
Good for many overlapping hits	✔️	✔️
Better for score variance across retrievers	❌	✔️

🖼️ Visual Examples (Explaining the Fusion)

RRF Fusion Process Combining multiple ranking lists into a final order.
Scoring Distribution (DBSF Concept) DBSF normalizes scores based on distribution.

🧠 When to Use Which?

Use RRF if you trust consistent rank alignment across methods (e.g., dense & sparse both agree).
Use DBSF when score magnitudes vary greatly, and you want tighter normalization between dense and sparse scores.

📚 Resources

📘 Qdrant Hybrid Queries Docs (RRF & DBSF) — https://qdrant.tech/documentation/concepts/hybrid-queries/ ([qdrant.tech][1])
🧠 Qdrant Hybrid Search Tutorial — https://qdrant.tech/articles/hybrid-search/ ([qdrant.tech][3])
📊 Reciprocal Rank Fusion Concept — backed by search research & hybrid systems ([paradedb.com][4])

Conclusion: RRF and DBSF each offer unique ways to fuse hybrid signals. For most e-commerce hybrid search needs, DBSF provides nuanced score combination, while RRF excels at rank consensus.

RRF DBSF Qdrant vector search Python hybrid queries e-commerce search reciprocal rank fusion distribution-based score fusion CSV dataset

I hope this post was helpful to you.

Leave a reaction if you liked this post!