You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model Overview

Base model:Qwen/Qwen3-8B
Parameter count:trainable params: 174,587,904 || all params: 8,365,323,264 || trainable%: 2.0870
Track: Track1
Adaptation method: LoRA (PEFT)
Task: Mario game action prediction (completion-only SFT)

Provenance

This model is based on the publicly released Qwen/Qwen3-8B model. No modifications were made to the base weights. Task-specific behavior is introduced via LoRA adapters trained by the team.

Finetuning Data

Format: JSONL
Structure: system / user / assistant conversations
Content:
- System: Mario game rules and action constraints
- User: Structured game state observations
- Assistant: Discrete jump action decisions (Jump Level 0–6)

The dataset was constructed from game state–action pairs. Raw data is not publicly released; high-level structure and generation procedure are documented here.

Training (High-level)

Training type: Supervised fine-tuning (completion-only)
Objective: Predict the correct action completion given game state context
Loss masking:
- Loss is applied only to the assistant completion tokens
- System and user tokens are excluded from loss
Tokenization:
- Chat template with explicit system, user, assistant roles
- Assistant responses always start with "### Actions"
Max sequence length: 4096

Training Configuration

Epochs: 3
Optimizer: AdamW
Learning rate: 2e-4
LR scheduler: Cosine
Batch size (per device): 2
Gradient accumulation steps: 8
Precision: bfloat16
Gradient checkpointing: Enabled

LoRA Configuration

Rank (r): 64
Alpha: 128
Dropout: 0.0
Target modules:
- q_proj, k_proj, v_proj, o_proj
- gate_proj, up_proj, down_proj

Run Instructions

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-8B",
    torch_dtype="bfloat16",
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen3-8B",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(
    base_model,
    "small-lit/overfit_small-aicrowd-mario-lora-8B"
)

model.eval()

License & Usage

Base model license: Qwen3 License (see original model card)
Adapter weights: Released for evaluation and research purposes only
This model is intended solely for use within the AIcrowd Orak Game Agent Challenge evaluation.

Framework versions

PEFT 0.18.1

Downloads last month: -

Model tree for small-lit/overfit_small-aicrowd-mario-lora-8B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(883)

this model