Instructions to use rockypod/neotoi-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rockypod/neotoi-coder with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rockypod/neotoi-coder",
	filename="neotoi-coder-v1-q4_k_m_final.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use rockypod/neotoi-coder with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rockypod/neotoi-coder:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf rockypod/neotoi-coder:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rockypod/neotoi-coder:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf rockypod/neotoi-coder:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rockypod/neotoi-coder:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf rockypod/neotoi-coder:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rockypod/neotoi-coder:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rockypod/neotoi-coder:Q4_K_M

Use Docker

docker model run hf.co/rockypod/neotoi-coder:Q4_K_M

LM Studio
Jan

vLLM

How to use rockypod/neotoi-coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rockypod/neotoi-coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rockypod/neotoi-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rockypod/neotoi-coder:Q4_K_M

Ollama
How to use rockypod/neotoi-coder with Ollama:
```
ollama run hf.co/rockypod/neotoi-coder:Q4_K_M
```

Unsloth Studio new

How to use rockypod/neotoi-coder with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rockypod/neotoi-coder to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rockypod/neotoi-coder to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for rockypod/neotoi-coder to start chatting

Pi new

How to use rockypod/neotoi-coder with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf rockypod/neotoi-coder:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "neotoi-coder"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use rockypod/neotoi-coder with Docker Model Runner:
```
docker model run hf.co/rockypod/neotoi-coder:Q4_K_M
```

Lemonade

How to use rockypod/neotoi-coder with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull rockypod/neotoi-coder:Q4_K_M

Run and chat with the model

lemonade run user.neotoi-coder-Q4_K_M

List all available models

lemonade list

Neotoi Coder

A Rust / Dioxus 0.7 specialist LLM. v3.1 ships in three sizes — 15B, 8B, and 4B — all fine-tuned via RAFT (Retrieval-Augmented Fine-Tuning) on Qwen3 base models. Optimized for production-quality Dioxus 0.7 components with Tailwind v4 and WCAG 2.2 AAA accessibility.

All three are current. They were trained from the same v3.1 dataset, exam the same way, and ship together. Pick based on hardware, not currency.

Variants

Variant	Repo	Base	Params	Q4_K_M	Spec exam (104Q weighted, max 144.5)
8B (flagship)	`rockypod/neotoi-coder-8b`	Qwen3-8B	8.2B (6.95B non-embed)	4.68 GB	144.5 / 144.5 — 100.00%
4B	`rockypod/neotoi-coder-4b`	Qwen3-4B	4.0B (3.6B non-embed, tied)	2.33 GB	143.5 / 144.5 — 99.31%
15B	this repo (`rockypod/neotoi-coder`)	Qwen3-Coder-14B	14.8B (13.2B non-embed)	8.40 GB	137.0 / 144.5 — 94.81%

All three clear the 90% publication bar and the 95% release bar with all per-tier floors PASS. The 8B is the recommended default; pick the 4B if disk / RAM is tight (or for ~40% faster generation), pick the 15B for the broadest coverage and the most context-rich generations.

Each variant lives in its own model repo for searchability. This page (rockypod/neotoi-coder) is the family hub and hosts the 15B GGUFs.

Install via Ollama

# 8B — recommended default
ollama pull rockypod/neotoi-coder:8b

# 4B — disk / RAM constrained, ~40% faster generation
ollama pull rockypod/neotoi-coder:4b

# 15B — largest, broadest coverage
ollama pull rockypod/neotoi-coder:15b

Spec-exam scorecard — all three variants

Re-graded 2026-04-26 with the patched run_grade_v31.py (Q87 now also accepts LANG() / THEME() GlobalSignal accessor calls in addition to the literal Signal token — a false-negative fix that unlocked the 8B's perfect score).

Tier	Max wt	8B	4B	15B
T1 Fundamentals	12.0	12.0 ✅	11.0 ⚠️ 91.7%	12.0 ✅
T2 RSX Syntax	12.0	12.0 ✅	12.0 ✅	10.0 ⚠️ 83.3%
T3 Signal Hygiene	12.0	12.0 ✅	12.0 ✅	11.0 ✅ 91.7%
T4 WCAG / ARIA	21.0	21.0 ✅	21.0 ✅	16.5 ⚠️ 78.6%
T5 use_resource	12.0	12.0 ✅	12.0 ✅	12.0 ✅
T6 Hard Reasoning	20.0	20.0 ✅	20.0 ✅	20.0 ✅
T7 Primitives + CSS	18.0	18.0 ✅	18.0 ✅	18.0 ✅
T8 GlobalSignal / i18n	12.0	12.0 ✅	12.0 ✅	12.0 ✅
T9 Static Navigator	9.0	9.0 ✅	9.0 ✅	9.0 ✅
T10 Dioxus 0.7.4	12.0	12.0 ✅	12.0 ✅	12.0 ✅
T11 Server Functions	4.5	4.5 ✅	4.5 ✅	4.5 ✅
Total weighted	144.5	144.5	143.5	137.0
Total raw (of 103)	—	103	102	97
Percent	—	100.00%	99.31%	94.81%

Tier floors (82% on weight-1.0 / 1.5 tiers, 88% on weight-2.0 tiers): all PASS for all three variants.

What's new in v3.1 (vs v3.0)

Three sizes: 8B and 4B alongside the 15B base, both surpassing the 15B's score.
T1 Fundamentals → 100% on 8B and 15B, 91.7% on 4B (+8.3 pts vs v3.0).
T6 Hard Reasoning → 100% clean sweep, all three variants (+25 pts vs v3.0).
T8 GlobalSignal / i18n → 100% all three variants.
T10 Dioxus 0.7.4 → 100% all three variants.
8 tiers at 100% on the 15B; 11 tiers at 100% on the 8B (perfect).
Dataset: 4,880 curated examples across 43 topics (up from 4,535).

Version History

Version	Base (params)	Score	Exam	Dataset
v1.0	Qwen3-Coder-14B (14.8B)	51/60 (85.0%)	60Q standard	—
v2.0	Qwen3-Coder-14B (14.8B)	135.5/140 (96.8%)	100Q weighted	4,185
v3.0	Qwen3-Coder-14B (14.8B)	124.0/144.5 (85.8%)	103Q weighted	4,535
v3.1 15B	Qwen3-Coder-14B (14.8B)	137.0/144.5 (94.81%)	103Q weighted	4,880
v3.1 8B	Qwen3-8B (8.2B)	144.5/144.5 (100.00%)	103Q weighted	4,880
v3.1 4B	Qwen3-4B (4.0B, tied)	143.5/144.5 (99.31%)	103Q weighted	4,880

Files in this repo (`rockypod/neotoi-coder`, 15B and historical)

File	Format	Size	Use case
`neotoi-coder-v3.1-q4_k_m.gguf`	GGUF Q4_K_M	8.4 GB	LM Studio, llama.cpp, Ollama (current 15B)
`neotoi-coder-v3-q4_k_m_patched.gguf`	GGUF Q4_K_M	9 GB	v3.0 archive
`neotoi-coder-v2.0-q4_k_m.gguf`	GGUF Q4_K_M	9 GB	v2.0 archive
`neotoi-coder-v1-q4_k_m_final.gguf`	GGUF Q4_K_M	9 GB	v1.0 archive

For the 8B and 4B Q4_K_M GGUFs, see their dedicated repos:

Enabling Thinking Mode

This model emits Qwen3 native <think>...</think> blocks. Thinking is on by default with the _patched.gguf quants on inference backends that honor qwen3.thinking.

LM Studio

Field	Value
Before System	`<\|im_start\|>system`
After System	`<\|im_end\|>`
Before User	`<\|im_start\|>user`
After User	`<\|im_end\|>`
Before Assistant	`<\|im_start\|>assistant\n<think>`
After Assistant	`<\|im_end\|>`

Ollama (custom Modelfile, 15B)

FROM neotoi-coder-v3.1-q4_k_m.gguf
PARAMETER temperature 0.2
PARAMETER num_ctx 16384
PARAMETER stop "<|im_end|>"
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
<think>
"""
SYSTEM You are Neotoi, an expert Rust and Dioxus 0.7 developer.

Or simply:

ollama pull rockypod/neotoi-coder:15b

llama.cpp

./llama-cli \
  -m neotoi-coder-v3.1-q4_k_m.gguf \
  -ngl 99 \
  --temp 0.2 \
  -p "<|im_start|>user\nYour question<|im_end|>\n<|im_start|>assistant\n<think>"

What It Knows

Dioxus 0.7 RSX brace syntax — never function-call style
use_signal, use_resource with the canonical three-arm match
r#for on labels only, never inputs
WCAG 2.2 AAA: aria_labelledby, aria_describedby, live regions, role="alert", role="dialog"
dioxus-primitives — no manual ARIA on managed components
styles!() macro and native CSS modules
Tailwind v4 utility classes and semantic tokens
DaisyUI 5 components on Tailwind v4
GlobalSignal patterns (LANG / THEME), EN/VI i18n, dark-mode toggling via document::eval
Router patterns (#[derive(Routable)], nested layouts, query params, route guards)
Dioxus 0.7.4 APIs: WritableResultExt, WebSocket Stream+Sink, server-fn extractors

Known Limitations

rsx! macro drops on the 15B for 6 RSX-heavy questions (Q17 / 22 / 30 / 37 / 39 / 43); v3.2 target. The 8B and 4B do not reproduce these misses.
Non-Dioxus web frameworks — out of scope by design (SvelteKit coverage lives in rockypod/svcoder).
Playwright / E2E testing — out of scope.

Transparency

Per-variant weights: -8b · -4b · this repo (15B)
Exam runner, grader, per-question results: GitHub — rockypod/neotoi-coder
Ollama: ollama pull rockypod/neotoi-coder:8b (or :4b, or :15b)

The training dataset itself is not redistributed — see the GitHub repo for the data-generation pipeline. Tailwind v4 reference material is treated as a competence input, not a shipped artifact.

License & Attribution

Fine-tuned weights and dataset: licensed under the Neotoi Coder Community License v1.0 — see LICENSE. Commercial use of model outputs permitted. Weight redistribution prohibited. Mental health deployment requires written permission.

Upstream models: the base model and teacher model are licensed under the Apache License, Version 2.0 — see LICENSE-APACHE and NOTICE:

The Neotoi Coder 14B weights are a derivative work of Qwen3-Coder-14B, fine-tuned via LoRA adapters on the Neotoi Coder RAFT dataset and then merged + quantized to GGUF.

Credits

Unsloth — 2× faster fine-tuning
TRL — SFTTrainer
Qwen3-Coder-14B, Qwen3-8B, Qwen3-4B — base models
Dioxus — the framework this model specializes in
Claude Code — dataset pipeline and training infrastructure

Downloads last month: 149

GGUF

Model size

15B params

Architecture

qwen3

Hardware compatibility

4-bit

Neotoi Coder

Variants

Install via Ollama

Spec-exam scorecard — all three variants

What's new in v3.1 (vs v3.0)

Version History

Files in this repo (rockypod/neotoi-coder, 15B and historical)

Enabling Thinking Mode

LM Studio

Ollama (custom Modelfile, 15B)

llama.cpp

What It Knows

Known Limitations

Transparency

License & Attribution

Credits

Files in this repo (`rockypod/neotoi-coder`, 15B and historical)