XiYanSQL-QwenCoder-7B-2504 W4A16 AWQ

Quantized model derived from XGenerationLab/XiYanSQL-QwenCoder-7B-2504.

This repository contains a locally generated quantized checkpoint ready to be uploaded to the Hugging Face Hub. The folder includes the quantized weights, tokenizer files, and the exact quantization settings used to produce this artifact.

Format

  • Quantization type: AWQ
  • Bits: 4-bit weights / 16-bit activations
  • Calibration dataset: birdsql/bird-critic-1.0-open
  • Tested backend: Transformers

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "your-hf-username/XiYanSQL-QwenCoder-7B-2504-W4A16-AWQ"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=False,
)

About LLMToys

This quantizer is hosted in the LLMToys repository: https://github.com/CyberCastle/LLMToys. LLMToys is a collection of practical LLM tools and experiments maintained in a single codebase. It groups reusable components for local model execution, quantization workflows, runtime tuning, and structured generation pipelines such as natural-language-to-SQL.

Quantization Configuration

Setting Value
Base model XGenerationLab/XiYanSQL-QwenCoder-7B-2504
Output folder XiYanSQL-QwenCoder-7B-2504-W4A16-AWQ
Quantization scheme AWQ
Weight / activation format W4A16
Model architecture qwen2
Calibration dataset birdsql/bird-critic-1.0-open
Calibration split open
Dataset configuration n/a
Calibration samples used 256
Max sequence length 2048
Max GPU memory budget 12.0 GiB
Sequential onloading yes
Requested sequential targets safe-auto
Effective sequential targets Qwen2Attention, Qwen2MLP
Sequential targets per subgraph 1
trust_remote_code no
Memory preflight mode off
vLLM smoke test requested no

Toolchain

Setting Value
Generated at (UTC) 2026-05-03T21:28:14Z
Runner entrypoint uv run quantizer/run.py
llmcompressor 0.10.1.dev127+g76b28ce7
transformers 5.6.2
torch 2.11.0+cu130
compressed-tensors 0.15.1a20260428

Notes

  • This README is generated automatically by the quantizer so the artifact keeps its execution context.
  • Review the original base model license and any upstream usage restrictions before publishing this checkpoint.
  • If you rerun the quantizer with different settings, regenerate and upload the full output directory again.
Downloads last month
5
Safetensors
Model size
8B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CyberCastle/XiYanSQL-QwenCoder-7B-2504-W4A16-AWQ

Dataset used to train CyberCastle/XiYanSQL-QwenCoder-7B-2504-W4A16-AWQ