Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ pipeline_tag: text-generation
|
|
| 14 |
|
| 15 |
Today, we're announcing **Qwen3-Coder**, our most agentic code model to date. **Qwen3-Coder** is available in multiple sizes, but we're excited to introduce its most powerful variant first: **Qwen3-Coder-480B-A35B-Instruct**. featuring the following key enhancements:
|
| 16 |
|
| 17 |
-
- **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks, achieving results comparable to Claude Sonnet
|
| 18 |
- **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
|
| 19 |
- **Agentic Coding** supporting for most platfrom such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.
|
| 20 |
|
|
@@ -84,21 +84,9 @@ content = tokenizer.decode(output_ids, skip_special_tokens=True)
|
|
| 84 |
print("content:", content)
|
| 85 |
```
|
| 86 |
|
| 87 |
-
For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
|
| 88 |
-
- SGLang:
|
| 89 |
-
```shell
|
| 90 |
-
python -m sglang.launch_server --model-path Qwen/Qwen3-480B-A35B-Instruct-FP8 --tp8 --enable-ep-moe --context-length 262144
|
| 91 |
-
```
|
| 92 |
-
- vLLM:
|
| 93 |
-
```shell
|
| 94 |
-
vllm serve Qwen/Qwen3-480B-A35B-Instruct-FP8 --tensor-parallel-size 8 --enenable-expert-parallel --max-model-len 262144
|
| 95 |
-
```
|
| 96 |
-
|
| 97 |
**Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
|
| 98 |
|
| 99 |
|
| 100 |
-
For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
|
| 101 |
-
|
| 102 |
## Note on FP8
|
| 103 |
|
| 104 |
For convenience and performance, we have provided `fp8`-quantized model checkpoint for Qwen3, whose name ends with `-FP8`. The quantization method is fine-grained `fp8` quantization with block size of 128. You can find more details in the `quantization_config` field in `config.json`.
|
|
|
|
| 14 |
|
| 15 |
Today, we're announcing **Qwen3-Coder**, our most agentic code model to date. **Qwen3-Coder** is available in multiple sizes, but we're excited to introduce its most powerful variant first: **Qwen3-Coder-480B-A35B-Instruct**. featuring the following key enhancements:
|
| 16 |
|
| 17 |
+
- **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks, achieving results comparable to Claude Sonnet.
|
| 18 |
- **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
|
| 19 |
- **Agentic Coding** supporting for most platfrom such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.
|
| 20 |
|
|
|
|
| 84 |
print("content:", content)
|
| 85 |
```
|
| 86 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
**Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
|
| 88 |
|
| 89 |
|
|
|
|
|
|
|
| 90 |
## Note on FP8
|
| 91 |
|
| 92 |
For convenience and performance, we have provided `fp8`-quantized model checkpoint for Qwen3, whose name ends with `-FP8`. The quantization method is fine-grained `fp8` quantization with block size of 128. You can find more details in the `quantization_config` field in `config.json`.
|