ms-marco-MiniLM-L6-v2 Adapter¶

Scores passage relevance against a query for RAG reranking pipelines using cross-encoder/ms-marco-MiniLM-L6-v2.

Model details¶

Field	Value
Model	cross-encoder/ms-marco-MiniLM-L6-v2
Task	rank
Domain	general
License	Apache 2.0

Install¶

pip install synapse-adapter-sdk
pip install sentence-transformers torch

Verified output schema¶

CrossEncoder.predict() returns a raw logit as a numpy.ndarray of shape (1,):

from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2")
scores = model.predict([("How many people live in Berlin?",
                          "Berlin had a population of 3.7 million in 2022.")])
# numpy.ndarray([8.607138], dtype=float32)

The adapter normalises the raw logit to [0.0, 1.0] using the sigmoid function and stores it in payload.score.

Sigmoid normalization¶

The raw logit is unbounded (typically in the range −10 to +10). A sigmoid transform maps it to a principled probability-like score:

score = 1 / (1 + exp(-raw_score))

A raw score of 0 maps to exactly 0.5 (model is neutral)
Positive logits map above 0.5 (relevant passage)
Negative logits map below 0.5 (not relevant)

Supported task types¶

rank

Supported domains¶

general

Usage example¶

import time
from sentence_transformers import CrossEncoder
from ms_marco_cross_encoder_adapter import MsMarcoCrossEncoderAdapter

model   = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2")
adapter = MsMarcoCrossEncoderAdapter()

# 1. Prepare model input
#    task_header.query holds the query; payload.content holds the passage
model_input = adapter.ingress(ir)
# {"query": "How many people live in Berlin?",
#  "passage": "Berlin had a population of 3.7 million in 2022."}

# 2. Run the model (caller's responsibility)
t0 = time.monotonic()
scores = model.predict([(model_input["query"], model_input["passage"])])
latency_ms = int((time.monotonic() - t0) * 1000)

# 3. Convert output back to canonical IR
result_ir = adapter.egress(scores, ir, latency_ms=latency_ms)

# 4. Access the relevance score in [0.0, 1.0]
relevance = result_ir.payload.score

The query is read from ir.task_header.query. Set this field when building the IR for ranking:

from synapse_sdk.types import TaskHeader

task_header = TaskHeader(
    task_type="rank",
    domain="general",
    priority=2,
    latency_budget_ms=5000,
    query="How many people live in Berlin?",
)

Source¶

github.com/synapse-ir/adapters