Model Overview
- Base model:Qwen/Qwen3-8B
- Parameter count:trainable params: 174,587,904 || all params: 8,365,323,264 || trainable%: 2.0870
- Track: Track1
- Adaptation method: LoRA (PEFT)
- Task: Mario game action prediction (completion-only SFT)
Provenance
This model is based on the publicly released Qwen/Qwen3-8B model. No modifications were made to the base weights. Task-specific behavior is introduced via LoRA adapters trained by the team.
Finetuning Data
- Format: JSONL
- Structure: system / user / assistant conversations
- Content:
- System: Mario game rules and action constraints
- User: Structured game state observations
- Assistant: Discrete jump action decisions (Jump Level 0–6)
The dataset was constructed from game state–action pairs. Raw data is not publicly released; high-level structure and generation procedure are documented here.
Training (High-level)
- Training type: Supervised fine-tuning (completion-only)
- Objective: Predict the correct action completion given game state context
- Loss masking:
- Loss is applied only to the assistant completion tokens
- System and user tokens are excluded from loss
- Tokenization:
- Chat template with explicit system, user, assistant roles
- Assistant responses always start with "### Actions"
- Max sequence length: 4096
Training Configuration
- Epochs: 3
- Optimizer: AdamW
- Learning rate: 2e-4
- LR scheduler: Cosine
- Batch size (per device): 2
- Gradient accumulation steps: 8
- Precision: bfloat16
- Gradient checkpointing: Enabled
LoRA Configuration
- Rank (r): 64
- Alpha: 128
- Dropout: 0.0
- Target modules:
- q_proj, k_proj, v_proj, o_proj
- gate_proj, up_proj, down_proj
Run Instructions
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-8B",
torch_dtype="bfloat16",
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"Qwen/Qwen3-8B",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(
base_model,
"small-lit/overfit_small-aicrowd-mario-lora-8B"
)
model.eval()
License & Usage
- Base model license: Qwen3 License (see original model card)
- Adapter weights: Released for evaluation and research purposes only
- This model is intended solely for use within the AIcrowd Orak Game Agent Challenge evaluation.
Framework versions
- PEFT 0.18.1
- Downloads last month
- -