Integrate with Sentence Transformers v5.4

This PR adds the configuration files needed to load this model directly as a CrossEncoder via Sentence Transformers. The model uses an any-to-any Transformer with a LogitScore head that computes the logit difference between the "yes" and "no" tokens, i.e. the model's confidence that a document is relevant to a query. The model supports text, image, video, and multimodal (e.g. combinations of the previous) inputs via a structured message format.

A custom additional_chat_templates/reranker.jinja maps Sentence Transformers' structured messages (with "query" and "document" roles) to the model's expected format with the <Instruct>, <Query>, and <Document> fields, including the system prompt for yes/no judgment. The template includes a default instruction ("Given a search query, retrieve relevant candidates that answer the query.") as a fallback when no prompt is provided. unpad_inputs is set to false as Qwen3 can't flatten inputs nicely.

Added files:

modules.json: pipeline: Transformer & LogitScore
sentence_bert_config.json: any-to-any task, structured message format, multimodal config
config_sentence_transformers.json: default prompt ("Retrieve text relevant to the user's query."), Identity activation
additional_chat_templates/reranker.jinja: custom template for the reranker format
1_LogitScore/config.json: yes/no token IDs

Once the Sentence Transformers v5.4 release is out, the model can be used immediately like so:

from sentence_transformers import CrossEncoder

model = CrossEncoder("Qwen/Qwen3-VL-Reranker-8B", revision="refs/pr/9")

query = "A woman playing with her dog on a beach at sunset."
documents = [
    "A woman shares a joyful moment with her golden retriever on a sun-drenched beach at sunset, as the dog offers its paw in a heartwarming display of companionship and trust.",
    "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
    {
        "text": "A woman shares a joyful moment with her golden retriever on a sun-drenched beach at sunset, as the dog offers its paw in a heartwarming display of companionship and trust.",
        "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
    },
]

prompt = "Retrieve images or text relevant to the user's query."
pairs = [(query, doc) for doc in documents]
scores = model.predict(pairs, prompt=prompt)
print(scores)
# [1.3125, 0.25, 0.4375]

rankings = model.rank(query, documents, prompt=prompt)
print(rankings)
# [{'corpus_id': 0, 'score': 1.3125}, {'corpus_id': 2, 'score': 0.4375}, {'corpus_id': 1, 'score': 0.25}]

And after merging, the revision argument can be dropped.

Note that none of the old behaviour is affected/changed. It only adds an additional way to run this model in a familiar and common format.

If you are able to merge this before tomorrow's Sentence Transformers v5.4 release, then I will be able to include this in my blogpost and documentation as a release model without revision. Otherwise I'll document it with revision and I can drop that later.

Tom Aarsen

tomaarsen changed pull request status to open Apr 8

thenlper changed pull request status to merged about 1 month ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment