LunarLander / README.md
KraTUZen's picture
Update README.md
857fe57 verified
metadata
library_name: stable-baselines3
tags:
  - LunarLander-v2
  - deep-reinforcement-learning
  - reinforcement-learning
  - stable-baselines3
model-index:
  - name: LunarLander-Kratuzen
    results:
      - task:
          type: moon-landing-training
          name: reinforcement-learning-training
        dataset:
          name: state-action-landing-data
          type: reinforcement-learning-generated-data
        metrics:
          - type: mean_reward
            value: 266.40 +/- 21.38
            name: mean_reward
            verified: false

πŸš€ PPO Agent: LunarLander-Kratuzen

This is a trained PPO (Proximal Policy Optimization) agent for the LunarLander-v2 environment, built with Stable-Baselines3.
Repo ID: KraTUZen/LunarLander
Model name: LunarLander-Kratuzen


πŸ“Š Performance

  • Mean Reward: 266.40 Β± 21.38
  • Episodes Evaluated: 10
  • βœ… Consistently lands successfully, showing stability and robustness.

πŸ› οΈ Usage

from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
import gymnasium as gym

# Load model from Hugging Face Hub
model = load_from_hub(
    repo_id="KraTUZen/LunarLander",
    filename="LunarLander-Kratuzen.zip"
)

# Create environment
env = gym.make("LunarLander-v2")

# Run a quick evaluation loop
obs, info = env.reset()
for _ in range(20):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
env.close()

πŸ“¦ Training Setup

Parameter Value
Algorithm PPO
Policy MlpPolicy
Timesteps 1,000,000
n_steps 1024
batch_size 64
gamma 0.999
gae_lambda 0.98
ent_coef 0.01

🎯 Key Takeaways

  • Achieves high reward and stable landings.
  • Ready-to-use with Hugging Face Hub.
  • Reproducible training setup for reinforcement learning experiments.