Qwen3-8B-Function-Calling-xLAM-Unsloth

This model is a fine-tuned version of Qwen3-8B (Unsloth 4-bit) optimized for function calling using Unsloth for 2x faster training and 60% less VRAM.

Trained on the Salesforce/xlam-function-calling-60k dataset, which contains 60,000 function calling examples with queries, tool definitions, and structured answers.

Overview

Property Value
Developed by ermiaazarkhalili
License APACHE-2.0
Language English
Base Model Qwen3-8B (Unsloth 4-bit)
Model Size 8B parameters
Training Framework Unsloth + TRL
Training Method SFT with QLoRA (4-bit)
Context Length 2,048 tokens
GGUF Available Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF

Training Configuration

SFT + LoRA Settings

Parameter Value
Unsloth Class FastLanguageModel
Chat Template built-in Qwen3
Learning Rate 2e-4
Batch Size 1 per device
Gradient Accumulation 8 steps
Effective Batch Size 8
Max Steps 1 epoch (full dataset)
Optimizer AdamW 8-bit
LR Scheduler Linear
Warmup Steps 5
Precision Auto (BF16/FP16)
Gradient Checkpointing Enabled (Unsloth optimized)
Seed 3407

LoRA Configuration

Parameter Value
LoRA Rank (r) 16
LoRA Alpha 16
LoRA Dropout 0
Quantization 4-bit QLoRA
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Dataset

Property Value
Dataset xLAM Function Calling 60K
Training Samples 60,000
Format XML-tagged: <query>, <tools>, <answers>

Hardware

Property Value
GPU NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice)
Cluster DRAC Fir (Compute Canada)
Execution Papermill on SLURM

Training Outcome

Metric Value
SLURM Job ID 36885898
Runtime 3h 48m 36s (13716s)
Final Training Loss 0.2186
Peak VRAM 17.07 GB
GPU H100 80GB HBM3 (MIG 3g.40gb)

Usage

Quick Start (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Check if the numbers 8 and 1233 are powers of two."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Using with Unsloth (Fastest)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth",
    max_seq_length=2048,
    load_in_4bit=True,
)

4-bit Quantized Inference

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model = AutoModelForCausalLM.from_pretrained(
    "ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth",
    quantization_config=quantization_config,
    device_map="auto",
)

GGUF Versions

Quantized GGUF versions for CPU and edge inference are available at: Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF

Format Description
Q4_K_M Recommended — good balance of quality and size
Q5_K_M Higher quality, slightly larger
Q8_0 Near-lossless, largest GGUF size

Using with Ollama

ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M "Check if the numbers 8 and 1233 are powers of two."

Using with llama.cpp

./llama-cli -m Qwen3-8B-Function-Calling-xLAM-Unsloth-Q4_K_M.gguf -p "Check if the numbers 8 and 1233 are powers of two." -n 512

Limitations

  • Language: Primarily trained on English data
  • Knowledge Cutoff: Limited to base model's training data cutoff
  • Hallucinations: May generate plausible-sounding but incorrect information
  • Context Length: Fine-tuned with 2,048 token context window
  • Safety: Not extensively safety-tuned; use with appropriate guardrails

Training Framework Versions

Package Version
Unsloth 2026.4.4
TRL 0.24.0
Transformers 5.5.0
PyTorch 2.9.0
Datasets 4.3.0
PEFT 0.18.1
BitsAndBytes 0.49.2

Citation

@misc{ermiaazarkhalili_qwen3_8b_function_calling_xlam_unsloth,
    author = {ermiaazarkhalili},
    title = {Qwen3-8B-Function-Calling-xLAM-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth},
    year = {2026},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth}}
}

Acknowledgments

Downloads last month
257
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth

Quantizations
1 model

Dataset used to train ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth