Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

.gitattributes +2 -0
README.md +106 -0
config.json +33 -0
images/benchmark_results.png +3 -0
images/eval_metrics_phase1.png +0 -0
images/eval_metrics_phase2.png +0 -0
model-00001-of-00002.safetensors +3 -0
model-00002-of-00002.safetensors +3 -0
model.safetensors.index.json +22 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+benchmark_results.png filter=lfs diff=lfs merge=lfs -text
+images/benchmark_results.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,106 @@

+---
+license: mit
+base_model: moonshotai/Kimi-K2.5
+tags:
+- speculative-decoding
+- eagle3
+- draft-model
+- kimi-k2.5
+---
+## Model Overview
+**kimi-k2.5-eagle3** is an Eagle3 MTP draft model for accelerating inference of [Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5), trained with **[TorchSpec](https://github.com/torchspec-project/TorchSpec)** — an online speculative decoding training framework that runs FSDP training and inference concurrently.
+Training data is available at [lightseekorg/kimi-mtp-dataset](https://huggingface.co/datasets/lightseekorg/kimi-mtp-dataset).
+### Training Setup
+- **Cluster**: 4 nodes × 8× H200 (32 GPUs total)
+- **Training**: 2 nodes (16 GPUs), FSDP
+- **Inference**: 2 nodes (16 GPUs), Engine (TP=8 per node)
+- **Duration**: ~14 hours per phase
+Training ran in two phases, each 20k steps (~300k samples):
+- **Phase 1**: Regenerated [open-perfectblend](https://huggingface.co/datasets/mlabonne/open-perfectblend) dataset
+- **Phase 2**: Mixed dataset (English, VL, Chinese, function-call, agent, creative writing)
+All training responses were **regenerated by Kimi-K2.5 via Engine** to match the base model's exact token distribution.
+### Training Curves
+The plots show loss, token acceptance accuracy, and simulated accept_length during training. Both eval sets contain 256 samples drawn from each phase's own training corpus.
+**Phase 1 (steps 0 → 20k):**
+![Phase 1 training curves](images/eval_metrics_phase1.png)
+**Phase 2 (steps 20k → 40k):**
+![Phase 2 training curves](images/eval_metrics_phase2.png)
+---
+## Performance
+The primary metric is **accept_length** — the average number of tokens accepted per speculation step with `topk=1, num_steps=3, num_draft_tokens=4`. Higher is better.
+Benchmarks were run using [SpecForge](https://github.com/sgl-project/SpecForge/blob/main/benchmarks)'s `bench_eagle3.py`. BFCL v3 benchmarks (†) use a custom extension to the original script.
+![accept_length by dataset and method](images/benchmark_results.png)
+| Category | Dataset | n | Phase 1 (20k steps) | Phase 2 (40k steps) |
+|----------|---------|---|---------------------|---------------------|
+| Dialogue | [MTBench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) | 80 | 2.624 | 2.687 |
+| Chinese | [CEval](https://huggingface.co/datasets/ceval/ceval-exam) | 212 | 1.482 | 2.295 |
+| Math | [GSM8K](https://github.com/openai/grade-school-math) | 500 | 3.123 | 3.201 |
+| Code | [HumanEval](https://huggingface.co/datasets/openai/openai_humaneval) | 164 | 3.242 | 3.285 |
+| Math | [MATH500](https://huggingface.co/datasets/HuggingFaceH4/MATH-500) | 500 | 3.323 | 3.342 |
+| Math | [AIME](https://huggingface.co/datasets/Maxwell-Jia/AIME_2024) | 30 | 2.972 | 3.033 |
+| VL | [MMStar](https://huggingface.co/datasets/Lin-Chen/MMStar) | 200 | 2.566 | 2.787 |
+| Function Call † | [BFCL v3](https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard) simple | 400 | 3.729 | 3.798 |
+| Function Call † | BFCL v3 multiple | 200 | 3.745 | 3.809 |
+| Function Call † | BFCL v3 parallel | 200 | 3.596 | 3.669 |
+| Function Call † | BFCL v3 parallel_multiple | 200 | 3.525 | 3.601 |
+| Function Call † | BFCL v3 live_simple | 1547 | 3.515 | 3.667 |
+| Function Call † | BFCL v3 live_multiple | 1030 | 3.407 | 3.453 |
+| Function Call † | BFCL v3 live_parallel | 97 | 3.303 | 3.410 |
+| Function Call † | BFCL v3 live_parallel_multiple | 170 | 3.070 | 3.159 |
+---
+## Quick Start
+### Requirements
+- NVIDIA GPU with CUDA 12.0+
+- [SGLang](https://github.com/sgl-project/sglang) ≥ 0.5.8
+### Launch Server
+```bash
+python -m sglang.launch_server \
+    --model-path /path/to/Kimi-K2.5 \
+    --tp 8 \
+    --trust-remote-code \
+    --speculative-algorithm EAGLE3 \
+    --speculative-draft-model-path lightseekorg/kimi-k2.5-eagle3 \
+    --speculative-num-steps 3 \
+    --speculative-eagle-topk 1 \
+    --speculative-num-draft-tokens 4 \
+    --mem-fraction-static 0.75 \
+    --dtype bfloat16
+```
+### Run Benchmarks
+```bash
+python bench_eagle3.py \
+    --model-path /path/to/Kimi-K2.5 \
+    --port 30000 \
+    --config-list 1,3,1,4 \
+    --benchmark-list <benchmark_name> \
+    --skip-launch-server
+```
+`--config-list` format: `topk,num_steps,topk,num_draft_tokens`.

config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "architectures": [
+    "LlamaForCausalLMEagle3"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 163584,
+  "draft_vocab_size": 163840,
+  "dtype": "float16",
+  "eos_token_id": 163585,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 7168,
+  "initializer_range": 0.02,
+  "intermediate_size": 12288,
+  "max_position_embeddings": 262144,
+  "max_window_layers": 36,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 64,
+  "num_hidden_layers": 1,
+  "num_key_value_heads": 64,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 163840
+}

images/benchmark_results.png ADDED Viewed

Git LFS Details

SHA256: d59803f08928f47dc7e65d83f04905d4e672becd1547c0faf8be617992e43749
Pointer size: 131 Bytes
Size of remote file: 173 kB

images/eval_metrics_phase1.png ADDED Viewed

images/eval_metrics_phase2.png ADDED Viewed

model-00001-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:55479ddb915be4c9ecb7e4b1920d38f7c6ef46f8e9e3ef0badfdec3c8527294e
+size 4007716200

model-00002-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff1819b92df82db085d4c1f71acdeffe28dbd44bb130a6fc65fd9d4e656b817b
+size 2348810368

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "metadata": {
+    "total_parameters": 3178262528,
+    "total_size": 6356525056
+  },
+  "weight_map": {
+    "embed_tokens.weight": "model-00001-of-00002.safetensors",
+    "fc.weight": "model-00001-of-00002.safetensors",
+    "lm_head.weight": "model-00002-of-00002.safetensors",
+    "midlayer.hidden_norm.weight": "model-00001-of-00002.safetensors",
+    "midlayer.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "midlayer.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "midlayer.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "midlayer.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "midlayer.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "midlayer.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "midlayer.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "midlayer.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "midlayer.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "norm.weight": "model-00001-of-00002.safetensors"
+  }
+}