You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Overview

  • Base model:Qwen/Qwen3-8B
  • Parameter count:trainable params: 174,587,904 || all params: 8,365,323,264 || trainable%: 2.0870
  • Track: Track1
  • Adaptation method: LoRA (PEFT)
  • Task: Mario game action prediction (completion-only SFT)

Provenance

This model is based on the publicly released Qwen/Qwen3-8B model. No modifications were made to the base weights. Task-specific behavior is introduced via LoRA adapters trained by the team.

Finetuning Data

  • Format: JSONL
  • Structure: system / user / assistant conversations
  • Content:
    • System: Mario game rules and action constraints
    • User: Structured game state observations
    • Assistant: Discrete jump action decisions (Jump Level 0–6)

The dataset was constructed from game state–action pairs. Raw data is not publicly released; high-level structure and generation procedure are documented here.

Training (High-level)

  • Training type: Supervised fine-tuning (completion-only)
  • Objective: Predict the correct action completion given game state context
  • Loss masking:
    • Loss is applied only to the assistant completion tokens
    • System and user tokens are excluded from loss
  • Tokenization:
    • Chat template with explicit system, user, assistant roles
    • Assistant responses always start with "### Actions"
  • Max sequence length: 4096

Training Configuration

  • Epochs: 3
  • Optimizer: AdamW
  • Learning rate: 2e-4
  • LR scheduler: Cosine
  • Batch size (per device): 2
  • Gradient accumulation steps: 8
  • Precision: bfloat16
  • Gradient checkpointing: Enabled

LoRA Configuration

  • Rank (r): 64
  • Alpha: 128
  • Dropout: 0.0
  • Target modules:
    • q_proj, k_proj, v_proj, o_proj
    • gate_proj, up_proj, down_proj

Run Instructions

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-8B",
    torch_dtype="bfloat16",
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen3-8B",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(
    base_model,
    "small-lit/overfit_small-aicrowd-mario-lora-8B"
)

model.eval()

License & Usage

  • Base model license: Qwen3 License (see original model card)
  • Adapter weights: Released for evaluation and research purposes only
  • This model is intended solely for use within the AIcrowd Orak Game Agent Challenge evaluation.

Framework versions

  • PEFT 0.18.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for small-lit/overfit_small-aicrowd-mario-lora-8B

Finetuned
Qwen/Qwen3-8B
Adapter
(883)
this model