Integrate with Sentence Transformers v5.4

#3
by tomaarsen HF Staff - opened

Hello!

Pull Request overview

  • Integrate OmniEmbed-v0.1 with Sentence Transformers v5.4, so it can be loaded via SentenceTransformer("Tevatron/OmniEmbed-v0.1") and used for text, image, audio, and video retrieval out of the box. Requires transformers>=5.6.0.

Details

This is a mirror of the https://huggingface.co/Tevatron/OmniEmbed-v0.1/discussions/3 PR, but updated for the multivent model instead. Beyond e.g. editing the title in the README, all changes are the same, and the model can run with the same common format as other Sentence Transformers-compatible multimodal models:

import torch
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "Tevatron/OmniEmbed-v0.1-multivent",
    model_kwargs={
        "torch_dtype": torch.bfloat16,
        "attn_implementation": "flash_attention_2",  # pip install kernels; recommended but not mandatory
    },
    revision="refs/pr/3",
)

# For video on smaller GPUs, cap the processor up front:
model[0].processing_kwargs.update({
    "video": {"max_pixels": 64 * 28 * 28, "do_sample_frames": True, "fps": 1},
})

example_query = "How many input modality does Qwen2.5-Omni support?"
example_images = [
    "https://huggingface.co/Tevatron/OmniEmbed-v0.1/resolve/main/assets/qwen2.5omni_hgf.png",
    "https://huggingface.co/Tevatron/OmniEmbed-v0.1/resolve/main/assets/llama4_hgf.png",
]
query_embedding = model.encode_query(example_query)
document_embeddings = model.encode_document(example_images, batch_size=1)
print(model.similarity(query_embedding, document_embeddings))
# tensor([[0.5338, 0.3028]])

Let me know if you have any questions or feedback!

  • Tom Aarsen
tomaarsen changed pull request status to open
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment