π Reinforce Agent on Pixelcopter-PLE-v0
This repository contains a trained Reinforce (Policy Gradient) agent that successfully plays the Pixelcopter-PLE-v0 environment.
π Model Card
Model Name: Reinforce-Pixelcopter-PLE-v0
Environment: Pixelcopter-PLE-v0
Algorithm: Reinforce (Monte Carlo Policy Gradient)
Performance Metric:
- Achieves stable flight and obstacle avoidance across evaluation runs
- Mean reward demonstrates convergence to an effective policy
π Usage
from huggingface_hub import load_from_hub
import gym
# Load the trained Reinforce model
model = load_from_hub(
repo_id="KraTUZen/Reinforce-Pixelcopter-PLE-v0",
filename="reinforce.pkl"
)
# Initialize environment
env = gym.make(model["env_id"])
π§ Notes
- The agent is trained using the Reinforce algorithm, which updates policy parameters based on episodic returns.
- The environment is Pixelcopter-PLE-v0, a pixel-based game where the agent must keep the helicopter flying while avoiding obstacles.
- The serialized policy is stored in
reinforce.pkl.
π Repository Structure
reinforce.pklβ Trained policy weightsREADME.mdβ Documentation and usage guide
β Results
- The agent learns to maintain altitude and avoid collisions with obstacles.
- Demonstrates convergence to a stable policy using policy gradient methods.
π Environment Overview
- Observation Space: Pixel-based state representation (visual input)
- Action Space: Discrete (flap or no flap)
- Objective: Keep the helicopter flying while avoiding obstacles
- Reward: Positive reward for survival, penalties for collisions
π Learning Highlights
- Algorithm: Reinforce (Policy Gradient)
- Update Rule: Policy parameters updated using returns from sampled episodes
- Strengths: Effective for environments with discrete actions and episodic rewards
- Limitations: High variance in updates, mitigated with sufficient training episodes
Evaluation results
- mean_reward on Pixelcopter-PLE-v0self-reported12.030