Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
mr3haque
/
SLM-RL-Agent
like
0
Text Generation
Transformers
Safetensors
PEFT
mr3haque/SLM-RL-Agent-Data
English
rlhf
ppo
sft
lora
trl
small-language-models
pythia
smollm2
slm-rl-agent
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
SLM-RL-Agent
Ctrl+K
Ctrl+K
1 contributor
History:
3 commits
mr3haque
Fix num_samples 500 -> 200 to match raw evaluation files
507a812
verified
3 days ago
ppo
Publish 15 SFT + 15 PPO checkpoints for the SLM-RL-Agent framework
3 days ago
sft
Publish 15 SFT + 15 PPO checkpoints for the SLM-RL-Agent framework
3 days ago
.gitattributes
Safe
1.52 kB
initial commit
3 days ago
README.md
9.62 kB
Fix num_samples 500 -> 200 to match raw evaluation files
3 days ago