Upload folder using huggingface_hub
Browse files- .gitattributes +2 -0
- README.md +106 -0
- config.json +33 -0
- images/benchmark_results.png +3 -0
- images/eval_metrics_phase1.png +0 -0
- images/eval_metrics_phase2.png +0 -0
- model-00001-of-00002.safetensors +3 -0
- model-00002-of-00002.safetensors +3 -0
- model.safetensors.index.json +22 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
benchmark_results.png filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
images/benchmark_results.png filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
base_model: moonshotai/Kimi-K2.5
|
| 4 |
+
tags:
|
| 5 |
+
- speculative-decoding
|
| 6 |
+
- eagle3
|
| 7 |
+
- draft-model
|
| 8 |
+
- kimi-k2.5
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## Model Overview
|
| 12 |
+
|
| 13 |
+
**kimi-k2.5-eagle3** is an Eagle3 MTP draft model for accelerating inference of [Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5), trained with **[TorchSpec](https://github.com/torchspec-project/TorchSpec)** — an online speculative decoding training framework that runs FSDP training and inference concurrently.
|
| 14 |
+
|
| 15 |
+
Training data is available at [lightseekorg/kimi-mtp-dataset](https://huggingface.co/datasets/lightseekorg/kimi-mtp-dataset).
|
| 16 |
+
|
| 17 |
+
### Training Setup
|
| 18 |
+
|
| 19 |
+
- **Cluster**: 4 nodes × 8× H200 (32 GPUs total)
|
| 20 |
+
- **Training**: 2 nodes (16 GPUs), FSDP
|
| 21 |
+
- **Inference**: 2 nodes (16 GPUs), Engine (TP=8 per node)
|
| 22 |
+
- **Duration**: ~14 hours per phase
|
| 23 |
+
|
| 24 |
+
Training ran in two phases, each 20k steps (~300k samples):
|
| 25 |
+
- **Phase 1**: Regenerated [open-perfectblend](https://huggingface.co/datasets/mlabonne/open-perfectblend) dataset
|
| 26 |
+
- **Phase 2**: Mixed dataset (English, VL, Chinese, function-call, agent, creative writing)
|
| 27 |
+
|
| 28 |
+
All training responses were **regenerated by Kimi-K2.5 via Engine** to match the base model's exact token distribution.
|
| 29 |
+
|
| 30 |
+
### Training Curves
|
| 31 |
+
|
| 32 |
+
The plots show loss, token acceptance accuracy, and simulated accept_length during training. Both eval sets contain 256 samples drawn from each phase's own training corpus.
|
| 33 |
+
|
| 34 |
+
**Phase 1 (steps 0 → 20k):**
|
| 35 |
+
|
| 36 |
+

|
| 37 |
+
|
| 38 |
+
**Phase 2 (steps 20k → 40k):**
|
| 39 |
+
|
| 40 |
+

|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
|
| 44 |
+
## Performance
|
| 45 |
+
|
| 46 |
+
The primary metric is **accept_length** — the average number of tokens accepted per speculation step with `topk=1, num_steps=3, num_draft_tokens=4`. Higher is better.
|
| 47 |
+
|
| 48 |
+
Benchmarks were run using [SpecForge](https://github.com/sgl-project/SpecForge/blob/main/benchmarks)'s `bench_eagle3.py`. BFCL v3 benchmarks (†) use a custom extension to the original script.
|
| 49 |
+
|
| 50 |
+

|
| 51 |
+
|
| 52 |
+
| Category | Dataset | n | Phase 1 (20k steps) | Phase 2 (40k steps) |
|
| 53 |
+
|----------|---------|---|---------------------|---------------------|
|
| 54 |
+
| Dialogue | [MTBench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) | 80 | 2.624 | 2.687 |
|
| 55 |
+
| Chinese | [CEval](https://huggingface.co/datasets/ceval/ceval-exam) | 212 | 1.482 | 2.295 |
|
| 56 |
+
| Math | [GSM8K](https://github.com/openai/grade-school-math) | 500 | 3.123 | 3.201 |
|
| 57 |
+
| Code | [HumanEval](https://huggingface.co/datasets/openai/openai_humaneval) | 164 | 3.242 | 3.285 |
|
| 58 |
+
| Math | [MATH500](https://huggingface.co/datasets/HuggingFaceH4/MATH-500) | 500 | 3.323 | 3.342 |
|
| 59 |
+
| Math | [AIME](https://huggingface.co/datasets/Maxwell-Jia/AIME_2024) | 30 | 2.972 | 3.033 |
|
| 60 |
+
| VL | [MMStar](https://huggingface.co/datasets/Lin-Chen/MMStar) | 200 | 2.566 | 2.787 |
|
| 61 |
+
| Function Call † | [BFCL v3](https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard) simple | 400 | 3.729 | 3.798 |
|
| 62 |
+
| Function Call † | BFCL v3 multiple | 200 | 3.745 | 3.809 |
|
| 63 |
+
| Function Call † | BFCL v3 parallel | 200 | 3.596 | 3.669 |
|
| 64 |
+
| Function Call † | BFCL v3 parallel_multiple | 200 | 3.525 | 3.601 |
|
| 65 |
+
| Function Call † | BFCL v3 live_simple | 1547 | 3.515 | 3.667 |
|
| 66 |
+
| Function Call † | BFCL v3 live_multiple | 1030 | 3.407 | 3.453 |
|
| 67 |
+
| Function Call † | BFCL v3 live_parallel | 97 | 3.303 | 3.410 |
|
| 68 |
+
| Function Call † | BFCL v3 live_parallel_multiple | 170 | 3.070 | 3.159 |
|
| 69 |
+
|
| 70 |
+
---
|
| 71 |
+
|
| 72 |
+
## Quick Start
|
| 73 |
+
|
| 74 |
+
### Requirements
|
| 75 |
+
|
| 76 |
+
- NVIDIA GPU with CUDA 12.0+
|
| 77 |
+
- [SGLang](https://github.com/sgl-project/sglang) ≥ 0.5.8
|
| 78 |
+
|
| 79 |
+
### Launch Server
|
| 80 |
+
|
| 81 |
+
```bash
|
| 82 |
+
python -m sglang.launch_server \
|
| 83 |
+
--model-path /path/to/Kimi-K2.5 \
|
| 84 |
+
--tp 8 \
|
| 85 |
+
--trust-remote-code \
|
| 86 |
+
--speculative-algorithm EAGLE3 \
|
| 87 |
+
--speculative-draft-model-path lightseekorg/kimi-k2.5-eagle3 \
|
| 88 |
+
--speculative-num-steps 3 \
|
| 89 |
+
--speculative-eagle-topk 1 \
|
| 90 |
+
--speculative-num-draft-tokens 4 \
|
| 91 |
+
--mem-fraction-static 0.75 \
|
| 92 |
+
--dtype bfloat16
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
### Run Benchmarks
|
| 96 |
+
|
| 97 |
+
```bash
|
| 98 |
+
python bench_eagle3.py \
|
| 99 |
+
--model-path /path/to/Kimi-K2.5 \
|
| 100 |
+
--port 30000 \
|
| 101 |
+
--config-list 1,3,1,4 \
|
| 102 |
+
--benchmark-list <benchmark_name> \
|
| 103 |
+
--skip-launch-server
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
`--config-list` format: `topk,num_steps,topk,num_draft_tokens`.
|
config.json
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"LlamaForCausalLMEagle3"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"bos_token_id": 163584,
|
| 8 |
+
"draft_vocab_size": 163840,
|
| 9 |
+
"dtype": "float16",
|
| 10 |
+
"eos_token_id": 163585,
|
| 11 |
+
"head_dim": 128,
|
| 12 |
+
"hidden_act": "silu",
|
| 13 |
+
"hidden_size": 7168,
|
| 14 |
+
"initializer_range": 0.02,
|
| 15 |
+
"intermediate_size": 12288,
|
| 16 |
+
"max_position_embeddings": 262144,
|
| 17 |
+
"max_window_layers": 36,
|
| 18 |
+
"mlp_bias": false,
|
| 19 |
+
"model_type": "llama",
|
| 20 |
+
"num_attention_heads": 64,
|
| 21 |
+
"num_hidden_layers": 1,
|
| 22 |
+
"num_key_value_heads": 64,
|
| 23 |
+
"pretraining_tp": 1,
|
| 24 |
+
"rms_norm_eps": 1e-06,
|
| 25 |
+
"rope_scaling": null,
|
| 26 |
+
"rope_theta": 1000000,
|
| 27 |
+
"sliding_window": null,
|
| 28 |
+
"tie_word_embeddings": false,
|
| 29 |
+
"transformers_version": "4.57.1",
|
| 30 |
+
"use_cache": true,
|
| 31 |
+
"use_sliding_window": false,
|
| 32 |
+
"vocab_size": 163840
|
| 33 |
+
}
|
images/benchmark_results.png
ADDED
|
Git LFS Details
|
images/eval_metrics_phase1.png
ADDED
|
images/eval_metrics_phase2.png
ADDED
|
model-00001-of-00002.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:55479ddb915be4c9ecb7e4b1920d38f7c6ef46f8e9e3ef0badfdec3c8527294e
|
| 3 |
+
size 4007716200
|
model-00002-of-00002.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ff1819b92df82db085d4c1f71acdeffe28dbd44bb130a6fc65fd9d4e656b817b
|
| 3 |
+
size 2348810368
|
model.safetensors.index.json
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"metadata": {
|
| 3 |
+
"total_parameters": 3178262528,
|
| 4 |
+
"total_size": 6356525056
|
| 5 |
+
},
|
| 6 |
+
"weight_map": {
|
| 7 |
+
"embed_tokens.weight": "model-00001-of-00002.safetensors",
|
| 8 |
+
"fc.weight": "model-00001-of-00002.safetensors",
|
| 9 |
+
"lm_head.weight": "model-00002-of-00002.safetensors",
|
| 10 |
+
"midlayer.hidden_norm.weight": "model-00001-of-00002.safetensors",
|
| 11 |
+
"midlayer.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 12 |
+
"midlayer.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 13 |
+
"midlayer.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 14 |
+
"midlayer.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 15 |
+
"midlayer.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 16 |
+
"midlayer.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 17 |
+
"midlayer.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 18 |
+
"midlayer.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 19 |
+
"midlayer.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 20 |
+
"norm.weight": "model-00001-of-00002.safetensors"
|
| 21 |
+
}
|
| 22 |
+
}
|