Distil-Qwen3-4B-Text2SQL-GGUF

GGUF format of distil-qwen3-4b-text2sql for local inference with Ollama, llama.cpp, and other GGUF-compatible tools.

For a smaller download, see the 4-bit quantized version (~2.5GB).

Results

Metric DeepSeek-V3 (Teacher) Qwen3-4B (Base) This Model
LLM-as-a-Judge 80% 62% 80%
Exact Match 48% 16% 60%
ROUGE 87.6% 84.2% 89.5%

Quick Start with Ollama

1. Download the model files

# Clone this repository
git lfs install
git clone https://huggingface.co/distil-labs/distil-qwen3-4b-text2sql-gguf
cd distil-qwen3-4b-text2sql-gguf

2. Create a Modelfile

Create a file named Modelfile with the following content:

FROM ./model.gguf

TEMPLATE """{{- $lastUserIdx := -1 -}}
{{- range $idx, $msg := .Messages -}}
{{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}}
{{- end }}
{{- if or .System .Tools }}<|im_start|>system
{{ if .System }}{{ .System }}

{{ end }}
{{- if .Tools }}# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end -}}
<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}{{ end }}
{{- if .ToolCalls }}
{{- range .ToolCalls }}
<tool_call>
{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
</tool_call>
{{- end }}
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}"""

3. Create and run the model

# Create the Ollama model
ollama create distil-qwen3-4b-text2sql -f Modelfile

# Run the model
ollama run distil-qwen3-4b-text2sql

Usage with Python

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:11434/v1", api_key="EMPTY")

schema = """CREATE TABLE employees (
  id INTEGER PRIMARY KEY,
  name TEXT NOT NULL,
  department TEXT,
  salary INTEGER
);"""

question = "How many employees earn more than 50000?"

response = client.chat.completions.create(
    model="distil-qwen3-4b-text2sql",
    messages=[
        {
            "role": "system",
            "content": """You are given a database schema and a natural language question. Generate the SQL query that answers the question.

Rules:
- Use only tables and columns from the provided schema
- Use uppercase SQL keywords (SELECT, FROM, WHERE, etc.)
- Use SQLite-compatible syntax
- Output only the SQL query, no explanations"""
        },
        {
            "role": "user",
            "content": f"Schema:\n{schema}\n\nQuestion: {question}"
        }
    ],
    temperature=0
)

print(response.choices[0].message.content)
# Output: SELECT COUNT(*) FROM employees WHERE salary > 50000;

Model Details

Property Value
Format GGUF (F16)
Size ~15 GB
Base Model distil-labs/distil-qwen3-4b-text2sql
Parameters 4 billion
Context Length 262,144 tokens

Related Models

Model Format Size Use Case
distil-qwen3-4b-text2sql Safetensors ~8 GB Transformers, vLLM
This model GGUF (F16) ~15 GB Ollama, llama.cpp (full precision)
distil-qwen3-4b-text2sql-gguf-4bit GGUF (Q4_K_M) ~2.5 GB Ollama, llama.cpp (quantized)

License

This model is released under the Apache 2.0 license.

Links

Downloads last month
13
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for distil-labs/distil-qwen3-4b-text2sql-gguf

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(2)
this model

Collection including distil-labs/distil-qwen3-4b-text2sql-gguf