whisper-chess-tiny-fr

Fine-tuned Whisper-tiny for chess move recognition in FranΓ§ais (French).

Part of the SpeakChess project β€” play chess by voice in EN / FR / DE / ES.

Performance

  • Test WER: 0.03% on a held-out set of 3,798 samples (75% synth / 25% human, May 2026 retrain)
  • Human-only val WER: 0.44%
  • Speaker-stratified eval (held-out contributor, 610 samples): 0.00% WER
  • Domain: chess moves only (notation like Nf3, exd5, O-O)
  • Optimized for browser inference via transformers.js (ONNX + INT8) and React Native / mobile via whisper.cpp (GGML)

Files

ONNX (root) β€” for transformers.js / onnxruntime-web:

  • onnx/encoder_model_int8.onnx β€” INT8 encoder (Conv layers kept FP32 for WASM compat)
  • onnx/decoder_model_merged_int8.onnx β€” INT8 merged decoder
  • Standard Whisper tokenizer/processor files

Total ONNX runtime download: ~50 MB.

GGML (ggml/) β€” for whisper.cpp / mobile (React Native, Flutter, native apps):

  • ggml/ggml-tiny.bin (77.7 MB) β€” FP16
  • ggml/ggml-tiny-q5_0.bin (29.9 MB) β€” Q5_0 quantized (recommended for mobile)

Usage

transformers.js (browser, Node)

import { pipeline } from "@huggingface/transformers";

const transcriber = await pipeline(
  "automatic-speech-recognition",
  "atamano/whisper-chess-tiny-fr",
  { dtype: { encoder_model: "int8", decoder_model_merged: "int8" } }
);

const result = await transcriber(audio, { language: "fr", task: "transcribe" });
console.log(result.text); // e.g. "fou d 5"

Python (transformers + ONNX Runtime)

from transformers import WhisperProcessor
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq

processor = WhisperProcessor.from_pretrained("atamano/whisper-chess-tiny-fr")
model = ORTModelForSpeechSeq2Seq.from_pretrained(
    "atamano/whisper-chess-tiny-fr",
    encoder_file_name="encoder_model_int8.onnx",
    decoder_file_name="decoder_model_int8.onnx",
    use_cache=True,
)
model.generation_config.forced_decoder_ids = processor.get_decoder_prompt_ids(
    language="fr", task="transcribe", no_timestamps=True
)

whisper.cpp / React Native / Flutter (GGML)

Download ggml/ggml-tiny-q5_0.bin and pass it to your whisper.cpp binding of choice:

  • React Native: whisper.rn (initWhisper({ filePath: '.../ggml-tiny-q5_0.bin' }))
  • Flutter: whisper_ggml
  • Native: any whisper.cpp build

⚠️ Recommended post-processing β€” read this

The model outputs spoken French text ("fou d 5", "tour prend e huit", "petit roque"), not algebraic notation. To play the move on a board you need two more steps that this checkpoint does NOT do for you:

  1. Parse the spoken text into algebraic notation (e.g. "fou d 5" β†’ Bd5).
  2. Validate against legal moves on the current board, with a fuzzy fallback for single-letter file/rank confusions (a/b/c/d/e/f/g/h sound very close in French and the small Whisper architecture confuses them on roughly 10% of utterances).

The SpeakChess web app ships an open-source TypeScript implementation of both steps at:

next-web/src/lib/chessParser.ts (MIT)

Key exports:

  • parseChessMove(text) β€” fuzzy speech β†’ algebraic notation
  • findClosestLegalMove(text, legalSans, language) β€” when the parsed move isn't legal, picks the legal move whose canonical spoken form has the smallest edit distance to the transcription. Resolves ~7% of in-production misrecognitions without any model change.

The recommended pipeline:

audio
  ↓ Whisper (this model)
text ("fou g 5")
  ↓ parseChessMove
SAN ("Bg5")
  ↓ chess.js / python-chess legal moves filter
   β”‚
   β”œβ”€ legal β†’ play it
   └─ illegal β†’ findClosestLegalMove(text, legalSans, "fr")
                  ↓ ("Bd5" with edit-distance 1)
                play it

A vocabulary file with all (notation, spoken form, language) triples used at training time is available at data/processed/training_moves.json β€” regenerate from the canonical vocab via python training/generate_moves.py.

Training

See github.com/atamano/speakchess for the full pipeline.

  • Base: openai/whisper-tiny (39M params, 98% trainable via full fine-tune)
  • Method: Full fine-tuning, no LoRA, no gradient checkpointing (MPS quirk)
  • Data: 9,582 synthetic (Edge TTS, France / Canada / Belgium / Switzerland accents) + 3,097 validated human recordings from speakchess.indiefoundry.com/contribute
  • Augmentation: SpecAugment + audiomentations at runtime (noise / pitch / time-stretch / EQ)
  • suppress_tokens whitelist: 164 chess-vocab tokens, all others suppressed at generation
  • 5 epochs, batch 8 Γ— grad-accum 2, LR 1e-4 linear, human-oversample 4Γ—

License β€” Important

This model is licensed under CC BY-NC-SA 4.0. Summary:

  • βœ… Free to use for personal projects, research, education, and non-commercial demos
  • βœ… Free to share and adapt with attribution and same-license derivatives
  • ❌ Commercial use is NOT permitted without a separate license
  • ❌ Including the model in commercial products, paid services, or competing voice-chess offerings requires explicit written permission

For commercial licensing inquiries: antoine@darksquares.net

The full license text: https://creativecommons.org/licenses/by-nc-sa/4.0/

All rights reserved beyond the CC BY-NC-SA 4.0 grant.

Downloads last month
130
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for atamano/whisper-chess-tiny-fr

Quantized
(40)
this model