Qwen3-8B-Function-Calling-xLAM-Unsloth

This model is a fine-tuned version of Qwen3-8B (Unsloth 4-bit) optimized for function calling using Unsloth for 2x faster training and 60% less VRAM.

Trained on the Salesforce/xlam-function-calling-60k dataset, which contains 60,000 function calling examples with queries, tool definitions, and structured answers.

Overview

Property	Value
Developed by	ermiaazarkhalili
License	APACHE-2.0
Language	English
Base Model	Qwen3-8B (Unsloth 4-bit)
Model Size	8B parameters
Training Framework	Unsloth + TRL
Training Method	SFT with QLoRA (4-bit)
Context Length	2,048 tokens
GGUF Available	Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF

Training Configuration

SFT + LoRA Settings

Parameter	Value
Unsloth Class	`FastLanguageModel`
Chat Template	built-in Qwen3
Learning Rate	2e-4
Batch Size	1 per device
Gradient Accumulation	8 steps
Effective Batch Size	8
Max Steps	1 epoch (full dataset)
Optimizer	AdamW 8-bit
LR Scheduler	Linear
Warmup Steps	5
Precision	Auto (BF16/FP16)
Gradient Checkpointing	Enabled (Unsloth optimized)
Seed	3407

LoRA Configuration

Parameter	Value
LoRA Rank (r)	16
LoRA Alpha	16
LoRA Dropout	0
Quantization	4-bit QLoRA
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Dataset

Property	Value
Dataset	xLAM Function Calling 60K
Training Samples	60,000
Format	XML-tagged: `<query>`, `<tools>`, `<answers>`

Hardware

Property	Value
GPU	NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice)
Cluster	DRAC Fir (Compute Canada)
Execution	Papermill on SLURM

Training Outcome

Metric	Value
SLURM Job ID	`36885898`
Runtime	3h 48m 36s (13716s)
Final Training Loss	0.2186
Peak VRAM	17.07 GB
GPU	H100 80GB HBM3 (MIG 3g.40gb)

Usage

Quick Start (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Check if the numbers 8 and 1233 are powers of two."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Using with Unsloth (Fastest)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth",
    max_seq_length=2048,
    load_in_4bit=True,
)

4-bit Quantized Inference

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model = AutoModelForCausalLM.from_pretrained(
    "ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth",
    quantization_config=quantization_config,
    device_map="auto",
)

GGUF Versions

Quantized GGUF versions for CPU and edge inference are available at: Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF

Format	Description
`Q4_K_M`	Recommended — good balance of quality and size
`Q5_K_M`	Higher quality, slightly larger
`Q8_0`	Near-lossless, largest GGUF size

Using with Ollama

ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M "Check if the numbers 8 and 1233 are powers of two."

Using with llama.cpp

./llama-cli -m Qwen3-8B-Function-Calling-xLAM-Unsloth-Q4_K_M.gguf -p "Check if the numbers 8 and 1233 are powers of two." -n 512

Limitations

Language: Primarily trained on English data
Knowledge Cutoff: Limited to base model's training data cutoff
Hallucinations: May generate plausible-sounding but incorrect information
Context Length: Fine-tuned with 2,048 token context window
Safety: Not extensively safety-tuned; use with appropriate guardrails

Training Framework Versions

Package	Version
Unsloth	2026.4.4
TRL	0.24.0
Transformers	5.5.0
PyTorch	2.9.0
Datasets	4.3.0
PEFT	0.18.1
BitsAndBytes	0.49.2

Citation

@misc{ermiaazarkhalili_qwen3_8b_function_calling_xlam_unsloth,
    author = {ermiaazarkhalili},
    title = {Qwen3-8B-Function-Calling-xLAM-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth},
    year = {2026},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth}}
}

Acknowledgments

Unsloth for 2x faster fine-tuning
Base model developers (unsloth)
Hugging Face TRL Team for the training library
Salesforce xLAM for the function calling dataset
Compute Canada / DRAC for HPC resources

Downloads last month: 257

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth

Quantizations

1 model

ermiaazarkhalili
/

Qwen3-8B-Function-Calling-xLAM-Unsloth