Qwen3-8B-Function-Calling-xLAM-Unsloth
This model is a fine-tuned version of Qwen3-8B (Unsloth 4-bit) optimized for function calling using Unsloth for 2x faster training and 60% less VRAM.
Trained on the Salesforce/xlam-function-calling-60k dataset, which contains 60,000 function calling examples with queries, tool definitions, and structured answers.
Overview
Training Configuration
SFT + LoRA Settings
| Parameter |
Value |
| Unsloth Class |
FastLanguageModel |
| Chat Template |
built-in Qwen3 |
| Learning Rate |
2e-4 |
| Batch Size |
1 per device |
| Gradient Accumulation |
8 steps |
| Effective Batch Size |
8 |
| Max Steps |
1 epoch (full dataset) |
| Optimizer |
AdamW 8-bit |
| LR Scheduler |
Linear |
| Warmup Steps |
5 |
| Precision |
Auto (BF16/FP16) |
| Gradient Checkpointing |
Enabled (Unsloth optimized) |
| Seed |
3407 |
LoRA Configuration
| Parameter |
Value |
| LoRA Rank (r) |
16 |
| LoRA Alpha |
16 |
| LoRA Dropout |
0 |
| Quantization |
4-bit QLoRA |
| Target Modules |
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
Dataset
Hardware
| Property |
Value |
| GPU |
NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice) |
| Cluster |
DRAC Fir (Compute Canada) |
| Execution |
Papermill on SLURM |
Training Outcome
| Metric |
Value |
| SLURM Job ID |
36885898 |
| Runtime |
3h 48m 36s (13716s) |
| Final Training Loss |
0.2186 |
| Peak VRAM |
17.07 GB |
| GPU |
H100 80GB HBM3 (MIG 3g.40gb) |
Usage
Quick Start (Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "user", "content": "Check if the numbers 8 and 1233 are powers of two."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
Using with Unsloth (Fastest)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth",
max_seq_length=2048,
load_in_4bit=True,
)
4-bit Quantized Inference
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
model = AutoModelForCausalLM.from_pretrained(
"ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth",
quantization_config=quantization_config,
device_map="auto",
)
GGUF Versions
Quantized GGUF versions for CPU and edge inference are available at:
Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF
| Format |
Description |
Q4_K_M |
Recommended — good balance of quality and size |
Q5_K_M |
Higher quality, slightly larger |
Q8_0 |
Near-lossless, largest GGUF size |
Using with Ollama
ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M "Check if the numbers 8 and 1233 are powers of two."
Using with llama.cpp
./llama-cli -m Qwen3-8B-Function-Calling-xLAM-Unsloth-Q4_K_M.gguf -p "Check if the numbers 8 and 1233 are powers of two." -n 512
Limitations
- Language: Primarily trained on English data
- Knowledge Cutoff: Limited to base model's training data cutoff
- Hallucinations: May generate plausible-sounding but incorrect information
- Context Length: Fine-tuned with 2,048 token context window
- Safety: Not extensively safety-tuned; use with appropriate guardrails
Training Framework Versions
| Package |
Version |
| Unsloth |
2026.4.4 |
| TRL |
0.24.0 |
| Transformers |
5.5.0 |
| PyTorch |
2.9.0 |
| Datasets |
4.3.0 |
| PEFT |
0.18.1 |
| BitsAndBytes |
0.49.2 |
Citation
@misc{ermiaazarkhalili_qwen3_8b_function_calling_xlam_unsloth,
author = {ermiaazarkhalili},
title = {Qwen3-8B-Function-Calling-xLAM-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth}}
}
Acknowledgments