Update README.md

857fe57 verified 18 days ago

2.07 kB

library_name: stable-baselines3
tags:
  - LunarLander-v2
  - deep-reinforcement-learning
  - reinforcement-learning
  - stable-baselines3
model-index:
  - name: LunarLander-Kratuzen
    results:
      - task:
          type: moon-landing-training
          name: reinforcement-learning-training
        dataset:
          name: state-action-landing-data
          type: reinforcement-learning-generated-data
        metrics:
          - type: mean_reward
            value: 266.40 +/- 21.38
            name: mean_reward
            verified: false

🚀 PPO Agent: LunarLander-Kratuzen

This is a trained PPO (Proximal Policy Optimization) agent for the LunarLander-v2 environment, built with Stable-Baselines3.
Repo ID: KraTUZen/LunarLander
Model name: LunarLander-Kratuzen

📊 Performance

Mean Reward: 266.40 ± 21.38
Episodes Evaluated: 10
✅ Consistently lands successfully, showing stability and robustness.

🛠️ Usage

from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
import gymnasium as gym

# Load model from Hugging Face Hub
model = load_from_hub(
    repo_id="KraTUZen/LunarLander",
    filename="LunarLander-Kratuzen.zip"
)

# Create environment
env = gym.make("LunarLander-v2")

# Run a quick evaluation loop
obs, info = env.reset()
for _ in range(20):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
env.close()

📦 Training Setup

Parameter	Value
Algorithm	PPO
Policy	MlpPolicy
Timesteps	1,000,000
n_steps	1024
batch_size	64
gamma	0.999
gae_lambda	0.98
ent_coef	0.01

🎯 Key Takeaways

Achieves high reward and stable landings.
Ready-to-use with Hugging Face Hub.
Reproducible training setup for reinforcement learning experiments.