Instructions to use Davidsv/SUONG-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Davidsv/SUONG-1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Davidsv/SUONG-1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Davidsv/SUONG-1") model = AutoModelForCausalLM.from_pretrained("Davidsv/SUONG-1") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Davidsv/SUONG-1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Davidsv/SUONG-1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Davidsv/SUONG-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Davidsv/SUONG-1
- SGLang
How to use Davidsv/SUONG-1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Davidsv/SUONG-1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Davidsv/SUONG-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Davidsv/SUONG-1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Davidsv/SUONG-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Davidsv/SUONG-1 with Docker Model Runner:
docker model run hf.co/Davidsv/SUONG-1
Optimized Mistral-Hermes Merge (3B Parameters)
This is an optimized merge of pre-trained language models created using mergekit, successfully reducing the original 7B models to approximately 3B parameters while maintaining core capabilities.
Model Size Optimization
The reduction from 7B to 3B parameters was achieved through:
- Layer reduction from 32 to 12 layers
- Conversion to bfloat16 format (half precision)
- Selective layer range implementation
- SLERP merge method optimization
About Me
I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and resource usage.
🔗 Connect with me on LinkedIn
Merge Details
Merge Method & Optimization
This model was merged using the SLERP merge method with specific optimizations:
- Reduced to 12 layers for better memory efficiency
- Using bfloat16 format
- Optimized attention and MLP parameters
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
base_model: OpenPipe/mistral-ft-optimized-1218
dtype: bfloat16
merge_method: slerp
parameters:
t:
- filter: self_attn
value: [0.0, 0.5]
- filter: mlp
value: [1.0, 0.5]
- value: 0.5
slices:
- sources:
- layer_range: [0, 12]
model: OpenPipe/mistral-ft-optimized-1218
- layer_range: [0, 12]
model: mlabonne/NeuralHermes-2.5-Mistral-7B
- Downloads last month
- 2