Chess-SFT-6k — Global Chess Challenge 2025

Model Summary

Chess-SFT-6k is a small, text-only chess-playing language model fine-tuned via Supervised Fine-Tuning (SFT) to select legal and reasonable chess moves from symbolic board representations.
The model is designed for participation in the Global Chess Challenge 2025, where models must choose a legal move from a provided list without access to search, tools, or external engines at inference time.

This checkpoint represents an early-stopped SFT baseline, optimized to reduce illegal moves and catastrophic blunders, and intended as a foundation for reinforcement learning with verifiable rewards (GRPO).

Model Details

Developed by: Ritwika Kancharla
Model type: Decoder-only causal language model
Base model: Qwen/Qwen3-0.6B
Language: English
License: MIT
Finetuned from: Qwen/Qwen3-0.6B
Competition: Global Chess Challenge 2025 (AIcrowd × AGI House)

Intended Use

Direct Use

Selecting a single legal chess move (UCI format) from:
- FEN position
- Side to move
- List of legal moves
Text-only inference without any tools, engines, or search
Compatible with the Global Chess Challenge starter kit and evaluation pipeline

Downstream Use

Research on reasoning and decision-making in small language models
Experiments with curriculum learning and reinforcement learning (GRPO / RLVR)
Educational or analytical chess assistants (non-engine-based)

Out-of-Scope Use

Replacement for classical chess engines
Deep tactical calculation or forced-mate search
Real-money, rated, or professional chess play
Any inference-time use of Stockfish, search, or external tools

Training Details

Training Data

Dataset: aicrowd/ChessExplained
Dataset file: ChessExplained_2500k_qwen3.parquet
Dataset size: ~~2.5M positions (~~1.04 GB)
Content:
- Symbolic chess positions
- Legal move lists
- Natural language explanations
Stockfish was used only offline for data generation and evaluation.
No external tools or engines are used at inference time.

Training Procedure

Method: Supervised Fine-Tuning (SFT)
Objective: Next-token prediction (Negative Log-Likelihood)
Precision: bf16 mixed precision
Optimizer: AdamW
Epochs: 1
Total steps: ~10,100
Checkpoint selection: Early stopping based on evaluation metrics

Training Loss Progression (Selected)

Step	Training Loss
100	5.1071
500	0.5045
1,000	0.3913
2,000	0.3310
3,000	0.2685
4,000	0.2374
5,000	0.2245
6,000	0.2127
10,000	0.1962

Although training loss continued to decrease after 6k steps, chess performance began to regress, motivating early stopping.

Evaluation

Evaluation Setup

Evaluation performed using the official Global Chess Challenge baseline
Legal move enforcement handled by the environment
Move quality evaluated using Stockfish (depth 20)
Metrics computed over full games

Metrics

Average Centipawn Loss (ACPL)
Win / draw / loss rates
Illegal move rate
Puzzle success rate

Results Across Checkpoints

Checkpoint	Positions Trained	Puzzle Success (%)	Illegal Moves (SF)	Avg ACPL vs Random	Avg ACPL vs Stockfish
500	4,000	0.0	50	600.0	410.6
3,000	24,000	5.0	40	420.3	126.5
6,000	48,000	25.0	7	50.7	90.9
9,000	72,000	24.0	2	144.9	85.5

Evaluation Summary

Early training significantly reduced illegal moves and catastrophic blunders.
ACPL vs Stockfish improved sharply up to ~6,000 steps.
Continued SFT beyond this point led to regression despite lower training loss.
Checkpoint 6,000 provided the best trade-off between stability and chess strength.

Bias, Risks, and Limitations

The model does not perform search and may miss deep tactical combinations.
Performance depends on patterns learned from supervised data.
Natural language explanations are not guaranteed to reflect optimal chess reasoning.
Like all chess-playing LLMs, the model may struggle in rare or highly tactical positions.

Recommendations

This model should be treated as a research artifact rather than a competitive chess engine.
Best performance is expected when combined with curriculum learning and reinforcement learning fine-tuning.

Technical Specifications

Architecture

Decoder-only Transformer
Autoregressive next-token prediction
Chat template and tokenizer inherited from Qwen3

Compute Infrastructure

Training hardware: NVIDIA H100 (Kaggle)
Evaluation: CPU/GPU
Frameworks: PyTorch, Hugging Face Transformers, vLLM
Chess environment: python-chess (evaluation only)

Environmental Impact

Cloud provider: Kaggle
Hardware: NVIDIA H100
Training duration: A few hours
Carbon emissions: Not formally estimated

Code and Reproducibility

Training & evaluation codebase:
https://github.com/AIcrowd/Global-Chess-Challenge-2025-Baselines
Key scripts:
- train.py
- run_evaluation.py

Citation

If you use this model, please cite the base model and the competition:

Base model:
Qwen/Qwen3-0.6B

Competition:
Global Chess Challenge 2025 (AIcrowd & AGI House)

Model Card Authors

Ritwika Kancharla

Contact

Via Hugging Face profile

Downloads last month: 45

Safetensors

Model size

0.6B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ritwika96/chess-sft-6k

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B