Chess-SFT-6k — Global Chess Challenge 2025

Model Summary

Chess-SFT-6k is a small, text-only chess-playing language model fine-tuned via Supervised Fine-Tuning (SFT) to select legal and reasonable chess moves from symbolic board representations.
The model is designed for participation in the Global Chess Challenge 2025, where models must choose a legal move from a provided list without access to search, tools, or external engines at inference time.

This checkpoint represents an early-stopped SFT baseline, optimized to reduce illegal moves and catastrophic blunders, and intended as a foundation for reinforcement learning with verifiable rewards (GRPO).


Model Details

  • Developed by: Ritwika Kancharla
  • Model type: Decoder-only causal language model
  • Base model: Qwen/Qwen3-0.6B
  • Language: English
  • License: MIT
  • Finetuned from: Qwen/Qwen3-0.6B
  • Competition: Global Chess Challenge 2025 (AIcrowd × AGI House)

Intended Use

Direct Use

  • Selecting a single legal chess move (UCI format) from:
    • FEN position
    • Side to move
    • List of legal moves
  • Text-only inference without any tools, engines, or search
  • Compatible with the Global Chess Challenge starter kit and evaluation pipeline

Downstream Use

  • Research on reasoning and decision-making in small language models
  • Experiments with curriculum learning and reinforcement learning (GRPO / RLVR)
  • Educational or analytical chess assistants (non-engine-based)

Out-of-Scope Use

  • Replacement for classical chess engines
  • Deep tactical calculation or forced-mate search
  • Real-money, rated, or professional chess play
  • Any inference-time use of Stockfish, search, or external tools

Training Details

Training Data

  • Dataset: aicrowd/ChessExplained
  • Dataset file: ChessExplained_2500k_qwen3.parquet
  • Dataset size: 2.5M positions (1.04 GB)
  • Content:
    • Symbolic chess positions
    • Legal move lists
    • Natural language explanations
  • Stockfish was used only offline for data generation and evaluation.
  • No external tools or engines are used at inference time.

Training Procedure

  • Method: Supervised Fine-Tuning (SFT)
  • Objective: Next-token prediction (Negative Log-Likelihood)
  • Precision: bf16 mixed precision
  • Optimizer: AdamW
  • Epochs: 1
  • Total steps: ~10,100
  • Checkpoint selection: Early stopping based on evaluation metrics

Training Loss Progression (Selected)

Step Training Loss
100 5.1071
500 0.5045
1,000 0.3913
2,000 0.3310
3,000 0.2685
4,000 0.2374
5,000 0.2245
6,000 0.2127
10,000 0.1962

Although training loss continued to decrease after 6k steps, chess performance began to regress, motivating early stopping.


Evaluation

Evaluation Setup

  • Evaluation performed using the official Global Chess Challenge baseline
  • Legal move enforcement handled by the environment
  • Move quality evaluated using Stockfish (depth 20)
  • Metrics computed over full games

Metrics

  • Average Centipawn Loss (ACPL)
  • Win / draw / loss rates
  • Illegal move rate
  • Puzzle success rate

Results Across Checkpoints

Checkpoint Positions Trained Puzzle Success (%) Illegal Moves (SF) Avg ACPL vs Random Avg ACPL vs Stockfish
500 4,000 0.0 50 600.0 410.6
3,000 24,000 5.0 40 420.3 126.5
6,000 48,000 25.0 7 50.7 90.9
9,000 72,000 24.0 2 144.9 85.5

Evaluation Summary

  • Early training significantly reduced illegal moves and catastrophic blunders.
  • ACPL vs Stockfish improved sharply up to ~6,000 steps.
  • Continued SFT beyond this point led to regression despite lower training loss.
  • Checkpoint 6,000 provided the best trade-off between stability and chess strength.

Bias, Risks, and Limitations

  • The model does not perform search and may miss deep tactical combinations.
  • Performance depends on patterns learned from supervised data.
  • Natural language explanations are not guaranteed to reflect optimal chess reasoning.
  • Like all chess-playing LLMs, the model may struggle in rare or highly tactical positions.

Recommendations

This model should be treated as a research artifact rather than a competitive chess engine.
Best performance is expected when combined with curriculum learning and reinforcement learning fine-tuning.


Technical Specifications

Architecture

  • Decoder-only Transformer
  • Autoregressive next-token prediction
  • Chat template and tokenizer inherited from Qwen3

Compute Infrastructure

  • Training hardware: NVIDIA H100 (Kaggle)
  • Evaluation: CPU/GPU
  • Frameworks: PyTorch, Hugging Face Transformers, vLLM
  • Chess environment: python-chess (evaluation only)

Environmental Impact

  • Cloud provider: Kaggle
  • Hardware: NVIDIA H100
  • Training duration: A few hours
  • Carbon emissions: Not formally estimated

Code and Reproducibility


Citation

If you use this model, please cite the base model and the competition:

Base model:
Qwen/Qwen3-0.6B

Competition:
Global Chess Challenge 2025 (AIcrowd & AGI House)


Model Card Authors

Ritwika Kancharla

Contact

Via Hugging Face profile

Downloads last month
45
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ritwika96/chess-sft-6k

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(665)
this model

Dataset used to train ritwika96/chess-sft-6k