Integrate with Sentence Transformers v5.4
#3
by tomaarsen HF Staff - opened
Hello!
Pull Request overview
- Integrate OmniEmbed-v0.1 with Sentence Transformers v5.4, so it can be loaded via
SentenceTransformer("Tevatron/OmniEmbed-v0.1")and used for text, image, audio, and video retrieval out of the box. Requirestransformers>=5.6.0.
Details
This is a mirror of the https://huggingface.co/Tevatron/OmniEmbed-v0.1/discussions/3 PR, but updated for the multivent model instead. Beyond e.g. editing the title in the README, all changes are the same, and the model can run with the same common format as other Sentence Transformers-compatible multimodal models:
import torch
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
"Tevatron/OmniEmbed-v0.1-multivent",
model_kwargs={
"torch_dtype": torch.bfloat16,
"attn_implementation": "flash_attention_2", # pip install kernels; recommended but not mandatory
},
revision="refs/pr/3",
)
# For video on smaller GPUs, cap the processor up front:
model[0].processing_kwargs.update({
"video": {"max_pixels": 64 * 28 * 28, "do_sample_frames": True, "fps": 1},
})
example_query = "How many input modality does Qwen2.5-Omni support?"
example_images = [
"https://huggingface.co/Tevatron/OmniEmbed-v0.1/resolve/main/assets/qwen2.5omni_hgf.png",
"https://huggingface.co/Tevatron/OmniEmbed-v0.1/resolve/main/assets/llama4_hgf.png",
]
query_embedding = model.encode_query(example_query)
document_embeddings = model.encode_document(example_images, batch_size=1)
print(model.similarity(query_embedding, document_embeddings))
# tensor([[0.5338, 0.3028]])
Let me know if you have any questions or feedback!
- Tom Aarsen
tomaarsen changed pull request status to open