amd
/

DeepSeek-V3.2-Speciale-mtp-ptpc

Model card Files Files and versions

haoyang-amd commited on Dec 5, 2025

Commit

a424902

·

verified ·

1 Parent(s): 97b1039

Update README.md

Files changed (1) hide show

README.md +0 -90

README.md CHANGED Viewed

@@ -1,90 +0,0 @@
----
-license: mit
-base_model:
-- deepseek-ai/DeepSeek-V3.2-Speciale
----
-**Note that the MTP layers of this model are also PTPC-quantized.**
-# Model Overview
-- **Model Architecture:** DeepSeek-V3.2-Speciale
-  - **Input:** Text
-  - **Output:** Text
-- **Supported Hardware Microarchitecture:** AMD MI350/MI355
-- **ROCm**: 7.0
-- **Operating System(s):** Linux
-- **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)
-- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.10)
-  - **Weight quantization:** Perchannel, FP8E4M3, Static
-  - **Activation quantization:** Pertoken, FP8E4M3, Dynamic
-- **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
-This model was built with deepseek-ai/DeepSeek-V3.2-Speciale model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for FP8E4M3 PTPC quantization.
-# Model Quantization
-The model was quantized from [deepseek-ai/DeepSeek-V3.2-Speciale](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights are quantized to FP8 and activations are quantized to FP8.
-### Accuracy
-<table>
-  <tr>
-   <td><strong>Benchmark</strong>
-   </td>
-   <td><strong>DeepSeek-V3.2-Speciale</strong>
-   </td>
-   <td><strong>DeepSeek-V3.2-Speciale-ptpc(this model)</strong>
-   </td>
-  </tr>
-  <tr>
-   <td>gsm8k
-   </td>
-   <td>96.00
-   </td>
-   <td>95.75
-   </td>
-  </tr>
-</table>
-### Reproduction
-vllm version: 0.11.2.dev521+gad32e3e19.rocm710
-aiter version: 0.1.6.post2.dev55+g59bd8ff2c
-lm_eval version: 0.4.9.2
-```
-export VLLM_USE_V1=1
-export SAFETENSORS_FAST_GPU=1
-export VLLM_ROCM_USE_AITER=1
-export VLLM_ROCM_USE_AITER_MOE=1
-model_path="/model_path/deepseek-ai/DeepSeek-V3.2-Speciale-ptpc"
-vllm serve $model_path \
-  --tensor-parallel-size 8 \
-  --data-parallel-size 1 \
-  --max-num-batched-tokens 32768 \
-  --trust-remote-code \
-  --no-enable-prefix-caching \
-  --disable-log-requests \
-  --kv-cache-dtype bfloat16 \
-  --gpu_memory_utilization 0.85 \
-  --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
-  --block-size 1
-lm_eval \
-  --model local-completions \
-  --tasks gsm8k \
-  --model_args model=/model_path/deepseek-ai/DeepSeek-V3.2-Speciale-ptpc,base_url=http://127.0.0.1:8000/v1/completions \
-  --batch_size auto \
-  --limit 400
-```
-# Deployment
-This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backends.
-# License
-Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.