Update README.md

c38929f verified 7 months ago

3.9 kB

	---
	datasets:
	- Jasaxion/LexSemBridge_eval
	language:
	- en
	library_name: sentence-transformers
	license: apache-2.0
	pipeline_tag: feature-extraction
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- lexsembridge
	widget: []
	---

	# LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation

	This model implements LexSemBridge, a unified framework that enhances dense query representations through fine-grained, input-aware vector modulation. LexSemBridge constructs latent enhancement vectors from input tokens using statistical, learned, and contextual paradigms, integrating them with dense embeddings via element-wise interaction. It naturally extends to both text and vision modalities with an appropriate tokenization, aiming to improve performance on fine-grained retrieval tasks where precise keyword alignment and span-level localization are crucial.

	The model is based on the paper [LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation](https://huggingface.co/papers/2508.17858).

	For the official code and further details, please refer to the [GitHub repository](https://github.com/Jasaxion/LexSemBridge/).

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 1024 tokens
	- Similarity Function: Cosine Similarity

	### Model Sources

	- Paper: [LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation](https://huggingface.co/papers/2508.17858)
	- Code/GitHub Repository: https://github.com/Jasaxion/LexSemBridge/
	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository (Sentence Transformers Library): [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face (Sentence Transformers Models): [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("Jasaxion/LexSemBridge_CLR_snowflake") # Example: LexSemBridge-CLR-snowflake model
	# Run inference
	sentences = [
	'The weather is lovely today.',
	"It's so sunny outside!",
	'He drove to the stadium.',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 1024]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	## Training Details

	### Framework Versions
	- Python: 3.10.14
	- Sentence Transformers: 3.1.0.dev0
	- Transformers: 4.44.2
	- PyTorch: 2.4.1+cu121
	- Accelerate: 0.34.2
	- Datasets: 2.21.0
	- Tokenizers: 0.19.1

	## Citation

	### BibTeX
	```bibtex
	@article{zhan2025lexsembridge,
	title={LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation},
	author={Zhan, Shaoxiong and Lin, Hai and Tan, Hongming and Cai, Xiaodong and Zheng, Hai-Tao and Su, Xin and Shan, Zifei and Liu, Ruitong and Kim, Hong-Gee},
	journal={arXiv preprint arXiv:2508.17858},
	year={2025}
	}
	```