gomoku-9x9

AlphaZero for 9x9 free-style gomoku (five-in-a-row, no opening restrictions).

Trained on a single Apple Silicon machine using PyTorch + MPS. Source code: https://github.com/jasonyandell/gomoku

How to load

import torch
from huggingface_hub import hf_hub_download
from gomoku.model import load_checkpoint

path = hf_hub_download("jasonyandell/gomoku-9x9", "model.pt")
model, payload = load_checkpoint(path, device="cpu")
model.eval()

# payload["epoch"], payload["total_games"], payload["model_config"]

For a quick value/policy on the empty board:

from gomoku.game import GameState
x = torch.from_numpy(GameState.initial().to_planes()).unsqueeze(0)
with torch.no_grad():
    policy_logits, value = model(x)

Architecture

Input: (B, 3, 9, 9) — own stones, opponent stones, side-to-move plane
Backbone: small ResNet (preset small: 64 filters x 4 residual blocks)
Policy head: 81-way logits over board squares
Value head: scalar in [-1, 1] via tanh
~316k parameters at the small preset (see gomoku/model.py)

Training setup

Apple Silicon (M5 Max) using PyTorch with the MPS backend
Single-process AlphaZero loop (self-play -> replay buffer -> train)
200 MCTS simulations per move during self-play
~28 seconds per training epoch
Checkpoints written every 5 epochs to checkpoints/

Current strength (honest)

Training is ongoing. These checkpoints reflect a small single-machine run on Apple Silicon, not a production-grade AlphaZero engine. Strength scales with checkpoint epoch — see training_state.json in this repo for the epoch this snapshot corresponds to.

Files

model.pt — slimmed checkpoint (weights + model_config + provenance). Compatible with gomoku.model.load_checkpoint.
config.json — model architecture config as JSON.
training_state.json — {epoch, total_games, wandb_run_id?} for the current snapshot.

License

MIT.

Downloads last month: 45

Video Preview

Reinforcement Learning