Kassadin88 commited on
Commit
88ea406
·
verified ·
1 Parent(s): 8afcaa2

Update README with training data and benchmark details

Browse files
Files changed (1) hide show
  1. README.md +146 -119
README.md CHANGED
@@ -1,31 +1,123 @@
1
  ---
 
2
  license: apache-2.0
3
- language:
4
- - en
5
- - zh
6
- base_model: Qwen/Qwen3.5-9B
7
  tags:
8
  - code
9
  - instruction-tuned
 
 
 
10
  - qwen
11
  - python
12
- - software-engineering
13
- library_name: transformers
 
14
  ---
15
 
16
  # Nemotron-9B-OpenCode
17
 
18
- A 9B parameter instruction-tuned model for software engineering tasks, fine-tuned from Qwen3.5-9B on high-quality code instruction data.
 
 
 
 
 
 
19
 
20
  ## Model Description
21
 
22
- - **Developed by:** [Kassadin88](https://huggingface.co/Kassadin88)
23
- - **Model type:** Causal Language Model
24
- - **Language(s):** English, Chinese
25
- - **Base model:** [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B)
26
- - **License:** Apache 2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- ## 🚀 Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ```python
31
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -52,8 +144,6 @@ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
52
  outputs = model.generate(
53
  **inputs,
54
  max_new_tokens=512,
55
- temperature=0.7,
56
- top_p=0.9,
57
  do_sample=True
58
  )
59
 
@@ -61,88 +151,7 @@ response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special
61
  print(response)
62
  ```
63
 
64
- ## 📊 Base Model Performance (Qwen3.5-9B)
65
-
66
- ### Language Benchmarks
67
-
68
- | Category | Benchmark | Score |
69
- |----------|-----------|-------|
70
- | **Knowledge & STEM** | MMLU-Pro | 82.5 |
71
- | | MMLU-Redux | 91.1 |
72
- | | C-Eval | 88.2 |
73
- | | GPQA Diamond | 81.7 |
74
- | **Instruction Following** | IFEval | 91.5 |
75
- | | MultiChallenge | 54.5 |
76
- | **Long Context** | AA-LCR | 63.0 |
77
- | | LongBench v2 | 55.2 |
78
- | **Reasoning & Coding** | HMMT Feb 25 | 83.2 |
79
- | | LiveCodeBench v6 | 65.6 |
80
- | **Multilingualism** | MMMLU | 81.2 |
81
- | | MMLU-ProX | 76.3 |
82
-
83
- ### Vision Language Benchmarks
84
-
85
- | Category | Benchmark | Score |
86
- |----------|-----------|-------|
87
- | **STEM and Puzzle** | MMMU | 78.4 |
88
- | | MathVision | 78.9 |
89
- | | Mathvista (mini) | 85.7 |
90
- | **General VQA** | RealWorldQA | 80.3 |
91
- | | MMStar | 79.7 |
92
- | **Document Understanding** | OmniDocBench1.5 | 87.7 |
93
- | | OCRBench | 89.2 |
94
- | **Video Understanding** | VideoMME (w/ sub) | 84.5 |
95
- | | MLVU | 84.4 |
96
-
97
- ## 📈 Training Details
98
-
99
- The model was full-parameter fine-tuned from Qwen3.5-9B using DeepSpeed ZeRO3 with BF16 precision.
100
-
101
- ### Training Results
102
-
103
- | Epoch | Train Loss | Eval Loss | Token Accuracy |
104
- |-------|------------|-----------|----------------|
105
- | 1.0 | 0.335 | 0.335 | 88.4% |
106
- | 2.0 | 0.317 | 0.317 | 89.0% |
107
- | 3.0 | **0.315** | **0.315** | **89.2%** |
108
-
109
- ## 📦 Training Data
110
-
111
- The model was trained on **Nemotron-SFT-OpenCode-v1**, a curated dataset containing 144,468 high-quality code instruction samples covering:
112
-
113
- - Software engineering tasks
114
- - Code generation and explanation
115
- - Debugging and code review
116
- - API usage and documentation
117
- - Multi-language programming (Python, JavaScript, TypeScript, etc.)
118
-
119
- ## 💻 Usage Tips
120
-
121
- ### For Code Generation
122
-
123
- ```python
124
- outputs = model.generate(
125
- **inputs,
126
- max_new_tokens=1024,
127
- temperature=0.3,
128
- top_p=0.95,
129
- do_sample=True
130
- )
131
- ```
132
-
133
- ### For Code Explanation
134
-
135
- ```python
136
- outputs = model.generate(
137
- **inputs,
138
- max_new_tokens=512,
139
- temperature=0.7,
140
- top_p=0.9,
141
- do_sample=True
142
- )
143
- ```
144
-
145
- ### With vLLM (Recommended for Production)
146
 
147
  ```python
148
  from vllm import LLM, SamplingParams
@@ -154,22 +163,19 @@ llm = LLM(
154
  )
155
 
156
  sampling_params = SamplingParams(
157
- temperature=0.3,
158
- top_p=0.95,
159
  max_tokens=1024
160
  )
161
 
162
  outputs = llm.generate(prompts, sampling_params)
163
  ```
164
 
165
- ### With SGLang
166
 
167
  ```bash
168
  python -m sglang.launch_server \
169
  --model-path Kassadin88/Nemotron-9B-OpenCode \
170
  --port 8000 \
171
- --tp-size 1 \
172
- --context-length 16384
173
  ```
174
 
175
  ### OpenAI-Compatible API
@@ -187,44 +193,65 @@ response = client.chat.completions.create(
187
  messages=[
188
  {"role": "user", "content": "Write a quicksort implementation in Python"}
189
  ],
190
- max_tokens=512,
191
- temperature=0.7,
192
- top_p=0.9
193
  )
194
  print(response.choices[0].message.content)
195
  ```
196
 
197
- ## 🔧 Recommended Sampling Parameters
 
 
 
 
 
 
 
 
 
 
 
198
 
199
- | Task Type | Temperature | Top-p | Top-k |
200
- |-----------|-------------|-------|-------|
201
- | Code Generation | 0.3 | 0.95 | 20 |
202
- | Code Explanation | 0.7 | 0.9 | 20 |
203
- | Debugging | 0.5 | 0.95 | 20 |
204
- | General Tasks | 0.7 | 0.8 | 20 |
 
 
 
 
 
 
 
 
 
 
 
205
 
206
- ## ⚠️ Limitations
207
 
208
- - The model is primarily trained on code and may not perform well on general conversational tasks
209
  - May occasionally generate incorrect or incomplete code
210
  - Should not be used for malicious code generation
211
 
212
- ## 📝 Citation
213
 
214
  ```bibtex
215
  @misc{nemotron-9b-opencode,
216
  author = {Kassadin88},
217
- title = {Nemotron-9B-OpenCode: An Instruction-Tuned Model for Software Engineering},
218
  year = {2026},
219
  publisher = {HuggingFace},
220
  url = {https://huggingface.co/Kassadin88/Nemotron-9B-OpenCode}
221
  }
222
  ```
223
 
224
- ## 🙏 Acknowledgments
225
 
226
- - Base model: [Qwen Team](https://github.com/QwenLM/Qwen3) for Qwen3.5-9B
227
- - Training framework: [MS-Swift](https://github.com/modelscope/swift)
 
228
 
229
  ---
230
 
 
1
  ---
2
+ library_name: transformers
3
  license: apache-2.0
4
+ license_link: https://huggingface.co/Qwen/Qwen3.5-9B/blob/main/LICENSE
5
+ pipeline_tag: image-text-to-text
6
+ base_model:
7
+ - Qwen/Qwen3.5-9B
8
  tags:
9
  - code
10
  - instruction-tuned
11
+ - software-engineering
12
+ - agent
13
+ - opencode
14
  - qwen
15
  - python
16
+ language:
17
+ - en
18
+ - zh
19
  ---
20
 
21
  # Nemotron-9B-OpenCode
22
 
23
+ A 9B parameter instruction-tuned model specialized for **autonomous software engineering agents**, fine-tuned from [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) on NVIDIA's Nemotron-SFT-OpenCode-v1 dataset.
24
+
25
+ ## Model Highlights
26
+
27
+ - **Specialized for Agentic Tasks**: Trained on agent trajectories for the [OpenCode](https://opencode.ai/) CLI framework, enabling autonomous code navigation, multi-step tool use, and software engineering workflows
28
+ - **Multi-Capability**: Supports general reasoning, tool calling, bash command execution, and dynamic skill loading
29
+ - **Production Ready**: Compatible with Hugging Face Transformers, vLLM, SGLang, and OpenAI-compatible APIs
30
 
31
  ## Model Description
32
 
33
+ | Property | Value |
34
+ |----------|-------|
35
+ | **Base Model** | Qwen3.5-9B |
36
+ | **Model Type** | Causal Language Model with Vision Encoder |
37
+ | **Parameters** | 9B |
38
+ | **Languages** | English, Chinese |
39
+ | **License** | Apache 2.0 |
40
+ | **Developer** | [Kassadin88](https://huggingface.co/Kassadin88) |
41
+
42
+ ## Training Data
43
+
44
+ This model was fine-tuned on **[Nemotron-SFT-OpenCode-v1](https://huggingface.co/datasets/nvidia/Nemotron-SFT-OpenCode-v1)**, NVIDIA's agentic instruction tuning dataset containing **144,468 high-quality samples** derived from 459K total trajectories. The dataset enhances LLMs' ability to operate within autonomous coding environments.
45
+
46
+ ### Dataset Composition
47
+
48
+ | Subset | Samples | Description |
49
+ |--------|---------|-------------|
50
+ | `general` | 90K | General agentic CLI questions with/without AGENTS.md context |
51
+ | `bash_only_tool` | 97K | Restricted tool set (todo + bash) for foundational agent capabilities |
52
+ | `bash_only_tool_skills` | 96K | Bash + skill loading for dynamic capability discovery |
53
+ | `question_tool` | 76K | Interactive clarification via user questions during task execution |
54
+ | `agent_skills` | 67K | Dynamic skill scanning and loading for task-specific capabilities |
55
+ | `agent_skills_question_tool` | 33K | Combined skill loading + user clarification for complex tasks |
56
+
57
+ ### Key Capabilities Trained
58
+
59
+ - **Code Navigation**: Repository-aware reasoning and codebase traversal
60
+ - **Tool Calling**: Structured tool invocation for bash, file operations, and more
61
+ - **Skill Loading**: Dynamic discovery and loading of relevant agent skills
62
+ - **Interactive Planning**: User clarification when requirements are ambiguous
63
+ - **Multi-Step Reasoning**: SWE-Bench style problem decomposition and implementation
64
+
65
+ ## Benchmark Results
66
+
67
+ The model inherits strong foundational capabilities from Qwen3.5-9B. Below are the base model's benchmark performances:
68
+
69
+ ### Language Benchmarks
70
 
71
+ <div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0">
72
+ <table style="width:100%;border-collapse:collapse;font-size:13px">
73
+ <thead><tr>
74
+ <th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed">Category</th>
75
+ <th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Benchmark</th>
76
+ <th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Qwen3.5-9B</th>
77
+ </tr></thead>
78
+ <tbody>
79
+ <tr><td rowspan="5" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Knowledge & STEM</td></tr>
80
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Pro</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.5</td></tr>
81
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Redux</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.1</td></tr>
82
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">C-Eval</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.2</td></tr>
83
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">GPQA Diamond</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.7</td></tr>
84
+ <tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Instruction Following</td></tr>
85
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">IFEval</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.5</td></tr>
86
+ <tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Long Context</td></tr>
87
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LongBench v2</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.2</td></tr>
88
+ <tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Reasoning & Coding</td></tr>
89
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LiveCodeBench v6</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.6</td></tr>
90
+ </tbody>
91
+ </table>
92
+ </div>
93
+
94
+ ### Vision Language Benchmarks
95
+
96
+ <div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0">
97
+ <table style="width:100%;border-collapse:collapse;font-size:13px">
98
+ <thead><tr>
99
+ <th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed">Category</th>
100
+ <th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Benchmark</th>
101
+ <th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Qwen3.5-9B</th>
102
+ </tr></thead>
103
+ <tbody>
104
+ <tr><td rowspan="4" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">STEM & Puzzle</td></tr>
105
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMMU</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.4</td></tr>
106
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MathVision</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td></tr>
107
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Mathvista (mini)</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.7</td></tr>
108
+ <tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Document Understanding</td></tr>
109
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OCRBench</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.2</td></tr>
110
+ <tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Video Understanding</td></tr>
111
+ <tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VideoMME (w/ sub)</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.5</td></tr>
112
+ </tbody>
113
+ </table>
114
+ </div>
115
+
116
+ > **Note**: For complete benchmark results across all categories, please refer to the [Qwen3.5-9B model card](https://huggingface.co/Qwen/Qwen3.5-9B).
117
+
118
+ ## Quick Start
119
+
120
+ ### Using Transformers
121
 
122
  ```python
123
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
144
  outputs = model.generate(
145
  **inputs,
146
  max_new_tokens=512,
 
 
147
  do_sample=True
148
  )
149
 
 
151
  print(response)
152
  ```
153
 
154
+ ### Using vLLM (Recommended for Production)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
 
156
  ```python
157
  from vllm import LLM, SamplingParams
 
163
  )
164
 
165
  sampling_params = SamplingParams(
 
 
166
  max_tokens=1024
167
  )
168
 
169
  outputs = llm.generate(prompts, sampling_params)
170
  ```
171
 
172
+ ### Using SGLang
173
 
174
  ```bash
175
  python -m sglang.launch_server \
176
  --model-path Kassadin88/Nemotron-9B-OpenCode \
177
  --port 8000 \
178
+ --tp-size 1
 
179
  ```
180
 
181
  ### OpenAI-Compatible API
 
193
  messages=[
194
  {"role": "user", "content": "Write a quicksort implementation in Python"}
195
  ],
196
+ max_tokens=512
 
 
197
  )
198
  print(response.choices[0].message.content)
199
  ```
200
 
201
+ ## Usage Tips
202
+
203
+ ### For Agentic Coding Tasks
204
+
205
+ ```python
206
+ messages = [
207
+ {"role": "system", "content": "You are an autonomous coding agent. Use the available tools to complete tasks."},
208
+ {"role": "user", "content": "Fix the bug in src/utils/parser.py that causes incorrect JSON parsing."}
209
+ ]
210
+ ```
211
+
212
+ ### For Code Generation
213
 
214
+ ```python
215
+ outputs = model.generate(
216
+ **inputs,
217
+ max_new_tokens=1024,
218
+ do_sample=True
219
+ )
220
+ ```
221
+
222
+ ### For Code Explanation
223
+
224
+ ```python
225
+ outputs = model.generate(
226
+ **inputs,
227
+ max_new_tokens=512,
228
+ do_sample=True
229
+ )
230
+ ```
231
 
232
+ ## Limitations
233
 
234
+ - The model is primarily trained on agentic coding tasks and may not perform optimally on general conversational tasks
235
  - May occasionally generate incorrect or incomplete code
236
  - Should not be used for malicious code generation
237
 
238
+ ## Citation
239
 
240
  ```bibtex
241
  @misc{nemotron-9b-opencode,
242
  author = {Kassadin88},
243
+ title = {Nemotron-9B-OpenCode: An Instruction-Tuned Model for Autonomous Software Engineering},
244
  year = {2026},
245
  publisher = {HuggingFace},
246
  url = {https://huggingface.co/Kassadin88/Nemotron-9B-OpenCode}
247
  }
248
  ```
249
 
250
+ ## Acknowledgments
251
 
252
+ - **Base Model**: [Qwen Team](https://github.com/QwenLM/Qwen3) for Qwen3.5-9B
253
+ - **Training Data**: [NVIDIA](https://huggingface.co/datasets/nvidia/Nemotron-SFT-OpenCode-v1) for Nemotron-SFT-OpenCode-v1
254
+ - **Training Framework**: [MS-Swift](https://github.com/modelscope/swift)
255
 
256
  ---
257