| --- |
| datasets: |
| - Jasaxion/LexSemBridge_eval |
| language: |
| - en |
| library_name: sentence-transformers |
| license: apache-2.0 |
| pipeline_tag: feature-extraction |
| tags: |
| - sentence-transformers |
| - sentence-similarity |
| - feature-extraction |
| - lexsembridge |
| widget: [] |
| --- |
| |
| # LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation |
|
|
| This model implements **LexSemBridge**, a unified framework that enhances dense query representations through fine-grained, input-aware vector modulation. LexSemBridge constructs latent enhancement vectors from input tokens using statistical, learned, and contextual paradigms, integrating them with dense embeddings via element-wise interaction. It naturally extends to both text and vision modalities with an appropriate tokenization, aiming to improve performance on fine-grained retrieval tasks where precise keyword alignment and span-level localization are crucial. |
|
|
| The model is based on the paper [LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation](https://huggingface.co/papers/2508.17858). |
|
|
| For the official code and further details, please refer to the [GitHub repository](https://github.com/Jasaxion/LexSemBridge/). |
|
|
| ## Model Details |
|
|
| ### Model Description |
| - **Model Type:** Sentence Transformer |
| - **Maximum Sequence Length:** 512 tokens |
| - **Output Dimensionality:** 1024 tokens |
| - **Similarity Function:** Cosine Similarity |
|
|
| ### Model Sources |
|
|
| - **Paper:** [LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation](https://huggingface.co/papers/2508.17858) |
| - **Code/GitHub Repository:** https://github.com/Jasaxion/LexSemBridge/ |
| - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
| - **Repository (Sentence Transformers Library):** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
| - **Hugging Face (Sentence Transformers Models):** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
| ### Full Model Architecture |
|
|
| ``` |
| SentenceTransformer( |
| (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel |
| (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
| (2): Normalize() |
| ) |
| ``` |
|
|
| ## Usage |
|
|
| ### Direct Usage (Sentence Transformers) |
|
|
| First install the Sentence Transformers library: |
|
|
| ```bash |
| pip install -U sentence-transformers |
| ``` |
|
|
| Then you can load this model and run inference. |
| ```python |
| from sentence_transformers import SentenceTransformer |
| |
| # Download from the 🤗 Hub |
| model = SentenceTransformer("Jasaxion/LexSemBridge_CLR_snowflake") # Example: LexSemBridge-CLR-snowflake model |
| # Run inference |
| sentences = [ |
| 'The weather is lovely today.', |
| "It's so sunny outside!", |
| 'He drove to the stadium.', |
| ] |
| embeddings = model.encode(sentences) |
| print(embeddings.shape) |
| # [3, 1024] |
| |
| # Get the similarity scores for the embeddings |
| similarities = model.similarity(embeddings, embeddings) |
| print(similarities.shape) |
| # [3, 3] |
| ``` |
|
|
| ## Training Details |
|
|
| ### Framework Versions |
| - Python: 3.10.14 |
| - Sentence Transformers: 3.1.0.dev0 |
| - Transformers: 4.44.2 |
| - PyTorch: 2.4.1+cu121 |
| - Accelerate: 0.34.2 |
| - Datasets: 2.21.0 |
| - Tokenizers: 0.19.1 |
|
|
| ## Citation |
|
|
| ### BibTeX |
| ```bibtex |
| @article{zhan2025lexsembridge, |
| title={LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation}, |
| author={Zhan, Shaoxiong and Lin, Hai and Tan, Hongming and Cai, Xiaodong and Zheng, Hai-Tao and Su, Xin and Shan, Zifei and Liu, Ruitong and Kim, Hong-Gee}, |
| journal={arXiv preprint arXiv:2508.17858}, |
| year={2025} |
| } |
| ``` |