Instructions to use chirbard/ppo-Worm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ml-agents
How to use chirbard/ppo-Worm with ml-agents:
mlagents-load-from-hf --repo-id="chirbard/ppo-Worm" --local-dir="./download: string[]s"
- Notebooks
- Google Colab
- Kaggle
| library_name: ml-agents | |
| tags: | |
| - Worm | |
| - deep-reinforcement-learning | |
| - reinforcement-learning | |
| - ML-Agents-Worm | |
| # **ppo** Agent playing **Worm** | |
| This is a trained model of a **ppo** agent playing **Worm** | |
| using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents). | |
| ## Usage (with ML-Agents) | |
| The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/ | |
| We wrote a complete tutorial to learn to train your first agent using ML-Agents and publish it to the Hub: | |
| - A *short tutorial* where you teach Huggy the Dog 🐶 to fetch the stick and then play with him directly in your | |
| browser: https://huggingface.co/learn/deep-rl-course/unitbonus1/introduction | |
| - A *longer tutorial* to understand how works ML-Agents: | |
| https://huggingface.co/learn/deep-rl-course/unit5/introduction | |
| ### Resume the training | |
| ```bash | |
| mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume | |
| ``` | |
| ### Watch your Agent play | |
| You can watch your agent **playing directly in your browser** | |
| 1. If the environment is part of ML-Agents official environments, go to https://huggingface.co/unity | |
| 2. Step 1: Find your model_id: chirbard/ppo-Worm | |
| 3. Step 2: Select your *.nn /*.onnx file | |
| 4. Click on Watch the agent play 👀 | |
| ## Hyperparameters | |
| ``` | |
| behaviors: | |
| Worm: | |
| trainer_type: ppo | |
| hyperparameters: | |
| batch_size: 2024 | |
| buffer_size: 20240 | |
| learning_rate: 0.0003 | |
| beta: 0.005 | |
| epsilon: 0.2 | |
| lambd: 0.95 | |
| num_epoch: 3 | |
| learning_rate_schedule: linear | |
| network_settings: | |
| normalize: true | |
| hidden_units: 512 | |
| num_layers: 3 | |
| vis_encode_type: simple | |
| reward_signals: | |
| extrinsic: | |
| gamma: 0.9995 | |
| strength: 1.0 | |
| keep_checkpoints: 5 | |
| max_steps: 5000000 | |
| time_horizon: 1000 | |
| summary_freq: 30000 | |
| ``` | |