Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

mr3haque
/
SLM-RL-Agent

Text Generation
Transformers
Safetensors
PEFT
English
rlhf
ppo
sft
lora
trl
small-language-models
pythia
smollm2
slm-rl-agent
Model card Files Files and versions
xet
Community
SLM-RL-Agent
Ctrl+K
Ctrl+K
  • 1 contributor
History: 3 commits
mr3haque's picture
mr3haque
Fix num_samples 500 -> 200 to match raw evaluation files
507a812 verified 3 days ago
  • ppo
    Publish 15 SFT + 15 PPO checkpoints for the SLM-RL-Agent framework 3 days ago
  • sft
    Publish 15 SFT + 15 PPO checkpoints for the SLM-RL-Agent framework 3 days ago
  • .gitattributes
    1.52 kB
    initial commit 3 days ago
  • README.md
    9.62 kB
    Fix num_samples 500 -> 200 to match raw evaluation files 3 days ago