You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen3.5-27B-DFlash

This model is still under training.

DFlash is a novel speculative decoding method that utilizes a lightweight block diffusion model for drafting. It enables efficient, high-quality parallel drafting that pushes the limits of inference speed.

This model is the drafter component. It must be used in conjunction with the target model Qwen/Qwen3.5-27B. It was trained with a context length of 4096 tokens.

🚀 Quick Start

SGLang

Installation

uv pip install "git+https://github.com/sgl-project/sglang.git@refs/pull/20547/head#subdirectory=python"

Inference

export SGLANG_ENABLE_SPEC_V2=1
export SGLANG_ENABLE_DFLASH_SPEC_V2=1
export SGLANG_ENABLE_OVERLAP_PLAN_STREAM=1

python -m sglang.launch_server \
    --model-path Qwen/Qwen3.5-27B \
    --speculative-algorithm DFLASH \
    --speculative-draft-model-path z-lab/Qwen3.5-27B-DFlash \
    --speculative-num-draft-tokens 16 \
    --tp-size 1 \
    --attention-backend fa3 \
    --mem-fraction-static 0.75 \
    --mamba-scheduler-strategy extra_buffer \
    --trust-remote-code

Note: For long-context or agentic usage, consider adding --speculative-dflash-draft-window-size WINDOW_SIZE to enable sliding-window attention for the draft model.

vLLM

Thanks to the community and all contributors! Check out the following PRs to see how to run DFlash on vLLM: #36847 and #36767.

Early Results

Thinking: enabled
Max new tokens: 4096
Block size: 16
2.2 Epoch

Dataset Accept Length

GSM8K 6.80

Math500 7.46

HumanEval 8.50

MBPP 6.76

MT-Bench 5.14

Alpaca 4.74

Dataset	Accept Length
GSM8K	6.80
Math500	7.46
HumanEval	8.50
MBPP	6.76
MT-Bench	5.14
Alpaca	4.74

Downloads last month: 201

Safetensors

Model size

4B params

Tensor type

BF16

Collection including z-lab/Qwen3.5-27B-DFlash

DFlash

Collection

Block Diffusion for Flash Speculative Decoding • 12 items • Updated 5 days ago • 26

Paper for z-lab/Qwen3.5-27B-DFlash

DFlash: Block Diffusion for Flash Speculative Decoding

Paper • 2602.06036 • Published Feb 5 • 44