haoyang-amd commited on
Commit
a424902
·
verified ·
1 Parent(s): 97b1039

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -90
README.md CHANGED
@@ -1,90 +0,0 @@
1
- ---
2
- license: mit
3
- base_model:
4
- - deepseek-ai/DeepSeek-V3.2-Speciale
5
- ---
6
-
7
- **Note that the MTP layers of this model are also PTPC-quantized.**
8
-
9
- # Model Overview
10
-
11
- - **Model Architecture:** DeepSeek-V3.2-Speciale
12
- - **Input:** Text
13
- - **Output:** Text
14
- - **Supported Hardware Microarchitecture:** AMD MI350/MI355
15
- - **ROCm**: 7.0
16
- - **Operating System(s):** Linux
17
- - **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)
18
- - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.10)
19
- - **Weight quantization:** Perchannel, FP8E4M3, Static
20
- - **Activation quantization:** Pertoken, FP8E4M3, Dynamic
21
- - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
22
-
23
- This model was built with deepseek-ai/DeepSeek-V3.2-Speciale model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for FP8E4M3 PTPC quantization.
24
-
25
- # Model Quantization
26
-
27
- The model was quantized from [deepseek-ai/DeepSeek-V3.2-Speciale](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights are quantized to FP8 and activations are quantized to FP8.
28
-
29
-
30
- ### Accuracy
31
-
32
- <table>
33
- <tr>
34
- <td><strong>Benchmark</strong>
35
- </td>
36
- <td><strong>DeepSeek-V3.2-Speciale</strong>
37
- </td>
38
- <td><strong>DeepSeek-V3.2-Speciale-ptpc(this model)</strong>
39
- </td>
40
- </tr>
41
- <tr>
42
- <td>gsm8k
43
- </td>
44
- <td>96.00
45
- </td>
46
- <td>95.75
47
- </td>
48
- </tr>
49
- </table>
50
-
51
- ### Reproduction
52
-
53
- vllm version: 0.11.2.dev521+gad32e3e19.rocm710
54
-
55
- aiter version: 0.1.6.post2.dev55+g59bd8ff2c
56
-
57
- lm_eval version: 0.4.9.2
58
- ```
59
- export VLLM_USE_V1=1
60
- export SAFETENSORS_FAST_GPU=1
61
- export VLLM_ROCM_USE_AITER=1
62
- export VLLM_ROCM_USE_AITER_MOE=1
63
- model_path="/model_path/deepseek-ai/DeepSeek-V3.2-Speciale-ptpc"
64
- vllm serve $model_path \
65
- --tensor-parallel-size 8 \
66
- --data-parallel-size 1 \
67
- --max-num-batched-tokens 32768 \
68
- --trust-remote-code \
69
- --no-enable-prefix-caching \
70
- --disable-log-requests \
71
- --kv-cache-dtype bfloat16 \
72
- --gpu_memory_utilization 0.85 \
73
- --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
74
- --block-size 1
75
-
76
- lm_eval \
77
- --model local-completions \
78
- --tasks gsm8k \
79
- --model_args model=/model_path/deepseek-ai/DeepSeek-V3.2-Speciale-ptpc,base_url=http://127.0.0.1:8000/v1/completions \
80
- --batch_size auto \
81
- --limit 400
82
-
83
- ```
84
-
85
- # Deployment
86
-
87
- This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backends.
88
-
89
- # License
90
- Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.