Instructions to use Kassadin88/Nemotron-9B-OpenCode with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Kassadin88/Nemotron-9B-OpenCode with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Kassadin88/Nemotron-9B-OpenCode")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Kassadin88/Nemotron-9B-OpenCode")
model = AutoModelForImageTextToText.from_pretrained("Kassadin88/Nemotron-9B-OpenCode")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Kassadin88/Nemotron-9B-OpenCode with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Kassadin88/Nemotron-9B-OpenCode"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kassadin88/Nemotron-9B-OpenCode",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Kassadin88/Nemotron-9B-OpenCode

SGLang

How to use Kassadin88/Nemotron-9B-OpenCode with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Kassadin88/Nemotron-9B-OpenCode" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kassadin88/Nemotron-9B-OpenCode",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Kassadin88/Nemotron-9B-OpenCode" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kassadin88/Nemotron-9B-OpenCode",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Kassadin88/Nemotron-9B-OpenCode with Docker Model Runner:
```
docker model run hf.co/Kassadin88/Nemotron-9B-OpenCode
```

Kassadin88 commited on Apr 9

Commit

88ea406

verified ·

1 Parent(s): 8afcaa2

Update README with training data and benchmark details

Browse files

Files changed (1) hide show

README.md +146 -119

README.md CHANGED Viewed

@@ -1,31 +1,123 @@
 ---
 license: apache-2.0
-language:
-- en
-- zh
-base_model: Qwen/Qwen3.5-9B
 tags:
 - code
 - instruction-tuned
 - qwen
 - python
-- software-engineering
-library_name: transformers
 ---
 # Nemotron-9B-OpenCode
-A 9B parameter instruction-tuned model for software engineering tasks, fine-tuned from Qwen3.5-9B on high-quality code instruction data.
 ## Model Description
-- **Developed by:** [Kassadin88](https://huggingface.co/Kassadin88)
-- **Model type:** Causal Language Model
-- **Language(s):** English, Chinese
-- **Base model:** [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B)
-- **License:** Apache 2.0
-## 🚀 Quick Start
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -52,8 +144,6 @@ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
 outputs = model.generate(
     **inputs,
     max_new_tokens=512,
-    temperature=0.7,
-    top_p=0.9,
     do_sample=True
 )
@@ -61,88 +151,7 @@ response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special
 print(response)
 ```
-## 📊 Base Model Performance (Qwen3.5-9B)
-### Language Benchmarks
-| Category | Benchmark | Score |
-|----------|-----------|-------|
-| **Knowledge & STEM** | MMLU-Pro | 82.5 |
-| | MMLU-Redux | 91.1 |
-| | C-Eval | 88.2 |
-| | GPQA Diamond | 81.7 |
-| **Instruction Following** | IFEval | 91.5 |
-| | MultiChallenge | 54.5 |
-| **Long Context** | AA-LCR | 63.0 |
-| | LongBench v2 | 55.2 |
-| **Reasoning & Coding** | HMMT Feb 25 | 83.2 |
-| | LiveCodeBench v6 | 65.6 |
-| **Multilingualism** | MMMLU | 81.2 |
-| | MMLU-ProX | 76.3 |
-### Vision Language Benchmarks
-| Category | Benchmark | Score |
-|----------|-----------|-------|
-| **STEM and Puzzle** | MMMU | 78.4 |
-| | MathVision | 78.9 |
-| | Mathvista (mini) | 85.7 |
-| **General VQA** | RealWorldQA | 80.3 |
-| | MMStar | 79.7 |
-| **Document Understanding** | OmniDocBench1.5 | 87.7 |
-| | OCRBench | 89.2 |
-| **Video Understanding** | VideoMME (w/ sub) | 84.5 |
-| | MLVU | 84.4 |
-## 📈 Training Details
-The model was full-parameter fine-tuned from Qwen3.5-9B using DeepSpeed ZeRO3 with BF16 precision.
-### Training Results
-| Epoch | Train Loss | Eval Loss | Token Accuracy |
-|-------|------------|-----------|----------------|
-| 1.0 | 0.335 | 0.335 | 88.4% |
-| 2.0 | 0.317 | 0.317 | 89.0% |
-| 3.0 | **0.315** | **0.315** | **89.2%** |
-## 📦 Training Data
-The model was trained on **Nemotron-SFT-OpenCode-v1**, a curated dataset containing 144,468 high-quality code instruction samples covering:
-- Software engineering tasks
-- Code generation and explanation
-- Debugging and code review
-- API usage and documentation
-- Multi-language programming (Python, JavaScript, TypeScript, etc.)
-## 💻 Usage Tips
-### For Code Generation
-```python
-outputs = model.generate(
-    **inputs,
-    max_new_tokens=1024,
-    temperature=0.3,
-    top_p=0.95,
-    do_sample=True
-)
-```
-### For Code Explanation
-```python
-outputs = model.generate(
-    **inputs,
-    max_new_tokens=512,
-    temperature=0.7,
-    top_p=0.9,
-    do_sample=True
-)
-```
-### With vLLM (Recommended for Production)
 ```python
 from vllm import LLM, SamplingParams
@@ -154,22 +163,19 @@ llm = LLM(
 )
 sampling_params = SamplingParams(
-    temperature=0.3,
-    top_p=0.95,
     max_tokens=1024
 )
 outputs = llm.generate(prompts, sampling_params)
 ```
-### With SGLang
 ```bash
 python -m sglang.launch_server \
     --model-path Kassadin88/Nemotron-9B-OpenCode \
     --port 8000 \
-    --tp-size 1 \
-    --context-length 16384
 ```
 ### OpenAI-Compatible API
@@ -187,44 +193,65 @@ response = client.chat.completions.create(
     messages=[
         {"role": "user", "content": "Write a quicksort implementation in Python"}
     ],
-    max_tokens=512,
-    temperature=0.7,
-    top_p=0.9
 )
 print(response.choices[0].message.content)
 ```
-## 🔧 Recommended Sampling Parameters
-| Task Type | Temperature | Top-p | Top-k |
-|-----------|-------------|-------|-------|
-| Code Generation | 0.3 | 0.95 | 20 |
-| Code Explanation | 0.7 | 0.9 | 20 |
-| Debugging | 0.5 | 0.95 | 20 |
-| General Tasks | 0.7 | 0.8 | 20 |
-## ⚠️ Limitations
-- The model is primarily trained on code and may not perform well on general conversational tasks
 - May occasionally generate incorrect or incomplete code
 - Should not be used for malicious code generation
-## 📝 Citation
 ```bibtex
 @misc{nemotron-9b-opencode,
   author = {Kassadin88},
-  title = {Nemotron-9B-OpenCode: An Instruction-Tuned Model for Software Engineering},
   year = {2026},
   publisher = {HuggingFace},
   url = {https://huggingface.co/Kassadin88/Nemotron-9B-OpenCode}
 }
 ```
-## 🙏 Acknowledgments
-- Base model: [Qwen Team](https://github.com/QwenLM/Qwen3) for Qwen3.5-9B
-- Training framework: [MS-Swift](https://github.com/modelscope/swift)
 ---

 ---
+library_name: transformers
 license: apache-2.0
+license_link: https://huggingface.co/Qwen/Qwen3.5-9B/blob/main/LICENSE
+pipeline_tag: image-text-to-text
+base_model:
+- Qwen/Qwen3.5-9B
 tags:
 - code
 - instruction-tuned
+- software-engineering
+- agent
+- opencode
 - qwen
 - python
+language:
+- en
+- zh
 ---
 # Nemotron-9B-OpenCode
+A 9B parameter instruction-tuned model specialized for **autonomous software engineering agents**, fine-tuned from [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) on NVIDIA's Nemotron-SFT-OpenCode-v1 dataset.
+## Model Highlights
+- **Specialized for Agentic Tasks**: Trained on agent trajectories for the [OpenCode](https://opencode.ai/) CLI framework, enabling autonomous code navigation, multi-step tool use, and software engineering workflows
+- **Multi-Capability**: Supports general reasoning, tool calling, bash command execution, and dynamic skill loading
+- **Production Ready**: Compatible with Hugging Face Transformers, vLLM, SGLang, and OpenAI-compatible APIs
 ## Model Description
+| Property | Value |
+|----------|-------|
+| **Base Model** | Qwen3.5-9B |
+| **Model Type** | Causal Language Model with Vision Encoder |
+| **Parameters** | 9B |
+| **Languages** | English, Chinese |
+| **License** | Apache 2.0 |
+| **Developer** | [Kassadin88](https://huggingface.co/Kassadin88) |
+## Training Data
+This model was fine-tuned on **[Nemotron-SFT-OpenCode-v1](https://huggingface.co/datasets/nvidia/Nemotron-SFT-OpenCode-v1)**, NVIDIA's agentic instruction tuning dataset containing **144,468 high-quality samples** derived from 459K total trajectories. The dataset enhances LLMs' ability to operate within autonomous coding environments.
+### Dataset Composition
+| Subset | Samples | Description |
+|--------|---------|-------------|
+| `general` | 90K | General agentic CLI questions with/without AGENTS.md context |
+| `bash_only_tool` | 97K | Restricted tool set (todo + bash) for foundational agent capabilities |
+| `bash_only_tool_skills` | 96K | Bash + skill loading for dynamic capability discovery |
+| `question_tool` | 76K | Interactive clarification via user questions during task execution |
+| `agent_skills` | 67K | Dynamic skill scanning and loading for task-specific capabilities |
+| `agent_skills_question_tool` | 33K | Combined skill loading + user clarification for complex tasks |
+### Key Capabilities Trained
+- **Code Navigation**: Repository-aware reasoning and codebase traversal
+- **Tool Calling**: Structured tool invocation for bash, file operations, and more
+- **Skill Loading**: Dynamic discovery and loading of relevant agent skills
+- **Interactive Planning**: User clarification when requirements are ambiguous
+- **Multi-Step Reasoning**: SWE-Bench style problem decomposition and implementation
+## Benchmark Results
+The model inherits strong foundational capabilities from Qwen3.5-9B. Below are the base model's benchmark performances:
+### Language Benchmarks
+<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0">
+<table style="width:100%;border-collapse:collapse;font-size:13px">
+<thead><tr>
+<th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed">Category</th>
+<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Benchmark</th>
+<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Qwen3.5-9B</th>
+</tr></thead>
+<tbody>
+<tr><td rowspan="5" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Knowledge & STEM</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Pro</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.5</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Redux</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.1</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">C-Eval</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.2</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">GPQA Diamond</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.7</td></tr>
+<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Instruction Following</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">IFEval</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.5</td></tr>
+<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Long Context</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LongBench v2</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.2</td></tr>
+<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Reasoning & Coding</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LiveCodeBench v6</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.6</td></tr>
+</tbody>
+</table>
+</div>
+### Vision Language Benchmarks
+<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0">
+<table style="width:100%;border-collapse:collapse;font-size:13px">
+<thead><tr>
+<th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed">Category</th>
+<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Benchmark</th>
+<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Qwen3.5-9B</th>
+</tr></thead>
+<tbody>
+<tr><td rowspan="4" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">STEM & Puzzle</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMMU</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.4</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MathVision</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Mathvista (mini)</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.7</td></tr>
+<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Document Understanding</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OCRBench</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.2</td></tr>
+<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Video Understanding</td></tr>
+<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VideoMME (w/ sub)</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.5</td></tr>
+</tbody>
+</table>
+</div>
+> **Note**: For complete benchmark results across all categories, please refer to the [Qwen3.5-9B model card](https://huggingface.co/Qwen/Qwen3.5-9B).
+## Quick Start
+### Using Transformers
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 outputs = model.generate(
     **inputs,
     max_new_tokens=512,
     do_sample=True
 )
 print(response)
 ```
+### Using vLLM (Recommended for Production)
 ```python
 from vllm import LLM, SamplingParams
 )
 sampling_params = SamplingParams(
     max_tokens=1024
 )
 outputs = llm.generate(prompts, sampling_params)
 ```
+### Using SGLang
 ```bash
 python -m sglang.launch_server \
     --model-path Kassadin88/Nemotron-9B-OpenCode \
     --port 8000 \
+    --tp-size 1
 ```
 ### OpenAI-Compatible API
     messages=[
         {"role": "user", "content": "Write a quicksort implementation in Python"}
     ],
+    max_tokens=512
 )
 print(response.choices[0].message.content)
 ```
+## Usage Tips
+### For Agentic Coding Tasks
+```python
+messages = [
+    {"role": "system", "content": "You are an autonomous coding agent. Use the available tools to complete tasks."},
+    {"role": "user", "content": "Fix the bug in src/utils/parser.py that causes incorrect JSON parsing."}
+]
+```
+### For Code Generation
+```python
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=1024,
+    do_sample=True
+)
+```
+### For Code Explanation
+```python
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=512,
+    do_sample=True
+)
+```
+## Limitations
+- The model is primarily trained on agentic coding tasks and may not perform optimally on general conversational tasks
 - May occasionally generate incorrect or incomplete code
 - Should not be used for malicious code generation
+## Citation
 ```bibtex
 @misc{nemotron-9b-opencode,
   author = {Kassadin88},
+  title = {Nemotron-9B-OpenCode: An Instruction-Tuned Model for Autonomous Software Engineering},
   year = {2026},
   publisher = {HuggingFace},
   url = {https://huggingface.co/Kassadin88/Nemotron-9B-OpenCode}
 }
 ```
+## Acknowledgments
+- **Base Model**: [Qwen Team](https://github.com/QwenLM/Qwen3) for Qwen3.5-9B
+- **Training Data**: [NVIDIA](https://huggingface.co/datasets/nvidia/Nemotron-SFT-OpenCode-v1) for Nemotron-SFT-OpenCode-v1
+- **Training Framework**: [MS-Swift](https://github.com/modelscope/swift)
 ---