RRF vs DBSF in Qdrant: Hybrid Search Examples

🧠 RRF vs DBSF in Qdrant: Python, E-Commerce, Real Examples

Intro: Hybrid search combines different retrieval signals (e.g., semantic embeddings and text matches) to deliver better relevance. Qdrant supports two fusion methods: RRF and DBSF. ([qdrant.tech][1])


📌 What Are RRF and DBSF?

🔹 Reciprocal Rank Fusion (RRF)

RRF ranks results from multiple methods (e.g., dense & sparse vectors) by their positions in each ranked list. The score is a sum of reciprocal rank contributions. This helps items that appear high in multiple rankings rise to the top. ([qdrant.tech][1])

Score formula (simplified):

score = Σ 1 / (k + rank)

Where rank is position in each query result, and k is a constant (often 2 in Qdrant). ([qdrant.tech][1])

🔹 Distribution-Based Score Fusion (DBSF)

DBSF looks at the score distributions from each method, normalizes them with mean±standard deviation, and sums normalized scores. It’s more sensitive to score value differences than just rank. ([qdrant.tech][2])


📊 Why It Matters in E-Commerce Search

Imagine an online shop where users search products:

  • Dense vector search captures semantic meaning (e.g., “comfortable running shoes”).
  • Sparse text matches catch exact keywords (e.g., “size 10 nike”).

Hybrid fusion boosts results that are semantically and textually relevant — ideal for product discovery and conversion.


🧪 Example Dataset: E-Commerce Products CSV

Assume a CSV named products.csv:

id title description price category
1 Running Shoes Lightweight running shoes 89.99 Sportswear
2 Trail Sneakers Off-road running shoes 99.99 Outdoors

⚙️ Step 1: Load & Embed Data (Python)

import pandas as pd
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer

# Load products
df = pd.read_csv("products.csv")

# Embedding models
dense_model = SentenceTransformer("all-MiniLM-L6-v2")
sparse_model = SentenceTransformer("BM25-sim")  # Example sparse model

📥 Step 2: Initialize Qdrant

client = QdrantClient(url="http://localhost:6333")

client.recreate_collection(
    collection_name="products",
    vectors=models.VectorParams(size=384, distance=models.Distance.COSINE),
)

📌 Step 3: Ingest Data (Dense + Sparse)

points = []
for idx, row in df.iterrows():
    dense_vec = dense_model.encode(row["description"]).tolist()
    sparse_vec = sparse_model.encode(row["title"]).tolist()

    points.append(
        models.PointStruct(
            id=idx,
            vector=dense_vec,
            payload={
                "title": row["title"],
                "description": row["description"],
                "sparse": sparse_vec,
            },
        )
    )

client.upsert(collection_name="products", points=points)

In Qdrant you can treat the sparse vector as another field that acts like text-match embedding.


🔍 Step 4: Hybrid Query with RRF

from qdrant_client import models

query_text = "best running shoes"
query_dense = dense_model.encode(query_text).tolist()
query_sparse = sparse_model.encode(query_text).tolist()

response_rrf = client.query_points(
    collection_name="products",
    prefetch=[
        models.Prefetch(query=query_sparse, using="sparse", limit=20),
        models.Prefetch(query=query_dense, using="dense", limit=20),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=10,
)

for p in response_rrf:
    print(p.payload["title"], p.score)

This fetches hybrid results ranked with RRF. ([qdrant.tech][1])


🔍 Step 5: Hybrid Query with DBSF

response_dbsf = client.query_points(
    collection_name="products",
    prefetch=[
        models.Prefetch(query=query_sparse, using="sparse", limit=20),
        models.Prefetch(query=query_dense, using="dense", limit=20),
    ],
    query=models.FusionQuery(fusion=models.Fusion.DBSF),
    limit=10,
)

for p in response_dbsf:
    print(p.payload["title"], p.score)

Now scoring uses distribution normalization, which can rank different product matches more subtly. ([qdrant.tech][2])


📈 Comparison: RRF vs DBSF

Feature RRF DBSF
Combines by rank ✔️
Combines by normalized scores ✔️
Sensitive to score magnitude Low High
Good for many overlapping hits ✔️ ✔️
Better for score variance across retrievers ✔️

🖼️ Visual Examples (Explaining the Fusion)

  1. RRF Fusion Process RRF Rank Fusion Combining multiple ranking lists into a final order.

  2. Scoring Distribution (DBSF Concept) Score Distribution Illustration DBSF normalizes scores based on distribution.


🧠 When to Use Which?

  • Use RRF if you trust consistent rank alignment across methods (e.g., dense & sparse both agree).
  • Use DBSF when score magnitudes vary greatly, and you want tighter normalization between dense and sparse scores.

📚 Resources


Conclusion: RRF and DBSF each offer unique ways to fuse hybrid signals. For most e-commerce hybrid search needs, DBSF provides nuanced score combination, while RRF excels at rank consensus.