Instructions to use ControlLLM/Llama-3.1-8B-SynE-FPT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ControlLLM/Llama-3.1-8B-SynE-FPT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ControlLLM/Llama-3.1-8B-SynE-FPT")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ControlLLM/Llama-3.1-8B-SynE-FPT", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ControlLLM/Llama-3.1-8B-SynE-FPT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ControlLLM/Llama-3.1-8B-SynE-FPT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ControlLLM/Llama-3.1-8B-SynE-FPT",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ControlLLM/Llama-3.1-8B-SynE-FPT

SGLang

How to use ControlLLM/Llama-3.1-8B-SynE-FPT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ControlLLM/Llama-3.1-8B-SynE-FPT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ControlLLM/Llama-3.1-8B-SynE-FPT",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ControlLLM/Llama-3.1-8B-SynE-FPT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ControlLLM/Llama-3.1-8B-SynE-FPT",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ControlLLM/Llama-3.1-8B-SynE-FPT with Docker Model Runner:
```
docker model run hf.co/ControlLLM/Llama-3.1-8B-SynE-FPT
```

Llama-3.1-8B-SynE-FPT / README.md

hawei

Add pipeline tag and library name (#1)

b428b99 verified over 1 year ago

preview code

raw

history blame contribute delete

4.67 kB

	---
	license: llama3.1
	datasets:
	- survivi/Llama-3-SynE-Dataset
	- hfl/stem_zh_instruction
	- llamafactory/alpaca_zh
	- llamafactory/alpaca_gpt4_zh
	- hfl/ruozhiba_gpt4
	- codingsteven/Llama-3-8B-chat
	language:
	- zh
	metrics:
	- accuracy
	base_model:
	- meta-llama/Llama-3.1-8B
	model-index:
	- name: Control-LLM-Llama3.1-8B-SynE-Full-Parameter-Tuning
	results:
	- task:
	type: pretraining-evaluation
	dataset:
	type: mixed
	name: Pretraining Evaluation Dataset
	metrics:
	- name: exact_match,strict-match (meta_pretrain)
	type: exact_match
	value: 0.45445720757159036
	stderr: 0.0035036029889520047
	verified: false
	- name: exact_match,strict-match (meta_bbh_3shot_cot_pretrain)
	type: exact_match
	value: 0.6482875134387959
	stderr: 0.005918167158231359
	verified: false
	- name: acc,none (meta_mmlu_5shot_pretrain)
	type: accuracy
	value: 0.649480131035465
	stderr: 0.004026616190778244
	verified: false
	- name: exact_match,strict-match (meta_mmlu_pro_5shot_pretrain)
	type: exact_match
	value: 0.34956781914893614
	stderr: 0.004347262544061378
	verified: false
	- task:
	type: chinese-evaluation
	dataset:
	type: mixed
	name: Chinese Evaluation Dataset
	metrics:
	- name: acc,none (ceval-valid)
	type: accuracy
	value: 0.5898959881129272
	stderr: 0.012699457390113113
	verified: false
	- name: exact_match,strict-match (ceval-valid-pretrain-cot_zh)
	type: exact_match
	value: 0.40193164933135217
	stderr: 0.01265090064840271
	verified: false
	- name: acc,none (cmmlu)
	type: accuracy
	value: 0.6018822310481782
	stderr: 0.004420298073040671
	verified: false
	- name: exact_match,strict-match (cmmlu_pretrain_cot_zh)
	type: exact_match
	value: 0.4425833189431877
	stderr: 0.004506238417180843
	verified: false
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Control-LLM-Llama3.1-8B-SynE-Full-Parameter-Tuning
	This is a fine-tuned model of Llama-3.1-8B for muliligual-Chinese tasks on SynE dataset.

	## Linked Paper
	This model is associated with the paper: [Control LLM: Controlled Evolution for Intelligence Retention in LLM](https://huggingface.co/papers/2501.10979).

	## Linked Open Source code - training, eval and benchmark
	This model is associated with the github: [Control-LLM](https://github.com/linkedin/ControlLLM).

	## Evaluation Results
	Here is an overview of the evaluation results and findings:

	### Benchmark Results Table

	The table below summarizes evaluation results across Chinese tasks and original capabilities.

	\| Model \| CEval \| CEvalC \| CMMLU \| CMMLUC \| C-Avg \| BBH \| MLU \| MLUP \| O-Avg \| Overall \|
	\|--------------------\|-----------\|------------\|-----------\|------------\|-----------\|---------\|---------\|----------\|-----------\|-------------\|
	\| Llama3.1-8B \| 48.3 \| 12.8 \| 51.1 \| 14.1 \| 13.9 \| 65.2 \| 65.4 \| 35.5 \| 45.9 \| 29.9 \|
	\| Llama-3-SynE \| 57.7 \| 22.3 \| 57.1 \| 22.8 \| 22.8 \| 61.9 \| 64.0 \| 32.6 \| 42.9 \| 32.9 \|
	\| Full Param Tune\| 59.0 \| 40.2 \| 60.2 \| 44.3 \| 43.8 \| 64.8 \| 64.9 \| 35.0 \| 45.4 \| 44.6 \|
	\| Stack Expansion \| 56.0 \| 32.7 \| 55.2 \| 33.4 \| 33.3 \| 62.3 \| 65.6 \| 35.3 \| 44.8 \| 39.1 \|
	\| Concat-Lerp* \| 57.1 \| 34.8 \| 57.0 \| 37.4 \| 37.1 \| 64.4 \| 64.6 \| 35.8 \| 45.9 \| 41.5 \|
	\| Hybrid Expansion\| 58.9 \| 44.7 \| 57.9 \| 44.3 \| 44.4 \| 65.1 \| 65.7\| 36.9 \| 46.8 \| 45.6 \|
	\| Control LLM* \| 57.0 \| 44.7 \| 56.0 \| 44.9 \| 44.8 \| 68.2\| 65.6 \| 37.9 \| 48.5 \| 46.7 \|

	---

	### Explanation:
	- CEval: Chinese Evaluation
	- CEvalC: Chinese Evaluation (CoT - Chain of Thought)
	- CMMLU: Chinese MMLU
	- CMMLUC: Chinese MMLU (CoT)
	- C-Avg: Chinese - Size Weighted Average across CEval, CEvalC, CMMLU, and CMMLUC
	- BBH: BigBench Hard
	- MLU: MMLU (Massive Multitask Language Understanding)
	- MLUP: MMLU Pro
	- O-Avg: Original Capability - Size Weighted Average across BBH, MLU, and MLUP
	- Overall: Combined average across all tasks

	### Full Parameter Tuning on Chinese-SynE
	The following plot illustrates the Catastrophic Forgetting of full parameter tuning in terms of hidden states alignment drift.

	![Catastrophic Forgetting](plots/alignment_worst.png)