--- license: mit base_model: - deepseek-ai/DeepSeek-V3.2-Speciale --- **Note that the MTP layers of this model are also PTPC-quantized.** # Model Overview - **Model Architecture:** DeepSeek-V3.2-Speciale - **Input:** Text - **Output:** Text - **Supported Hardware Microarchitecture:** AMD MI350/MI355 - **ROCm**: 7.0 - **Operating System(s):** Linux - **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/) - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.10) - **Weight quantization:** Perchannel, FP8E4M3, Static - **Activation quantization:** Pertoken, FP8E4M3, Dynamic - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup) This model was built with deepseek-ai/DeepSeek-V3.2-Speciale model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for FP8E4M3 PTPC quantization. # Model Quantization The model was quantized from [deepseek-ai/DeepSeek-V3.2-Speciale](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights are quantized to FP8 and activations are quantized to FP8. ### Accuracy
Benchmark DeepSeek-V3.2-Speciale DeepSeek-V3.2-Speciale-ptpc(this model)
gsm8k 96.00 95.75
### Reproduction Docker: rocm/vllm-private:rocm7.1_ubuntu22.04_vllm0.11.2_ptpc_fp8 vllm version: 0.11.2.dev521+gad32e3e19.rocm710 aiter version: 0.1.6.post2.dev55+g59bd8ff2c lm_eval version: 0.4.9.2 ``` export VLLM_USE_V1=1 export SAFETENSORS_FAST_GPU=1 export VLLM_ROCM_USE_AITER=1 export VLLM_ROCM_USE_AITER_MOE=1 model_path="/model_path/deepseek-ai/DeepSeek-V3.2-Speciale-ptpc" vllm serve $model_path \ --tensor-parallel-size 8 \ --data-parallel-size 1 \ --max-num-batched-tokens 32768 \ --trust-remote-code \ --no-enable-prefix-caching \ --disable-log-requests \ --kv-cache-dtype bfloat16 \ --gpu_memory_utilization 0.85 \ --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \ --block-size 1 lm_eval \ --model local-completions \ --tasks gsm8k \ --model_args model=/model_path/deepseek-ai/DeepSeek-V3.2-Speciale-ptpc,base_url=http://127.0.0.1:8000/v1/completions \ --batch_size auto \ --limit 400 ``` # Deployment This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backends. # License Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.