Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
ihbkaiser
/
trl-mcsd
like
0
arxiv:
2402.03300
arxiv:
2305.18290
arxiv:
2407.21783
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
main
trl-mcsd
/
examples
/
scripts
320 kB
Ctrl+K
Ctrl+K
1 contributor
History:
1 commit
ihbkaiser
Implement MCSD for experimental SDPO
1fa3c6c
verified
about 2 months ago
nemo_gym
Implement MCSD for experimental SDPO
about 2 months ago
openenv
Implement MCSD for experimental SDPO
about 2 months ago
ppo
Implement MCSD for experimental SDPO
about 2 months ago
async_grpo.py
2.3 kB
Implement MCSD for experimental SDPO
about 2 months ago
bco.py
5.85 kB
Implement MCSD for experimental SDPO
about 2 months ago
cpo.py
3.49 kB
Implement MCSD for experimental SDPO
about 2 months ago
dpo.py
917 Bytes
Implement MCSD for experimental SDPO
about 2 months ago
dpo_vlm.py
4.28 kB
Implement MCSD for experimental SDPO
about 2 months ago
gkd.py
5.05 kB
Implement MCSD for experimental SDPO
about 2 months ago
grpo_2048.py
5.2 kB
Implement MCSD for experimental SDPO
about 2 months ago
grpo_agent.py
10.3 kB
Implement MCSD for experimental SDPO
about 2 months ago
grpo_vlm.py
5.23 kB
Implement MCSD for experimental SDPO
about 2 months ago
gspo.py
4.42 kB
Implement MCSD for experimental SDPO
about 2 months ago
gspo_vlm.py
4.85 kB
Implement MCSD for experimental SDPO
about 2 months ago
kto.py
3.59 kB
Implement MCSD for experimental SDPO
about 2 months ago
mpo_vlm.py
4.26 kB
Implement MCSD for experimental SDPO
about 2 months ago
nash_md.py
5.1 kB
Implement MCSD for experimental SDPO
about 2 months ago
online_dpo.py
5.37 kB
Implement MCSD for experimental SDPO
about 2 months ago
online_dpo_vlm.py
7.71 kB
Implement MCSD for experimental SDPO
about 2 months ago
orpo.py
3.57 kB
Implement MCSD for experimental SDPO
about 2 months ago
prm.py
4.44 kB
Implement MCSD for experimental SDPO
about 2 months ago
reward_modeling.py
4.42 kB
Implement MCSD for experimental SDPO
about 2 months ago
rloo.py
3.44 kB
Implement MCSD for experimental SDPO
about 2 months ago
rloo_vlm.py
5.23 kB
Implement MCSD for experimental SDPO
about 2 months ago
sdpo_rar_science.py
23.5 kB
Implement MCSD for experimental SDPO
about 2 months ago
sft.py
917 Bytes
Implement MCSD for experimental SDPO
about 2 months ago
sft_gemma3.py
1.98 kB
Implement MCSD for experimental SDPO
about 2 months ago
sft_gpt_oss.py
3.19 kB
Implement MCSD for experimental SDPO
about 2 months ago
sft_nemotron_3.py
4.09 kB
Implement MCSD for experimental SDPO
about 2 months ago
sft_tiny_aya_tool_calling.py
5.31 kB
Implement MCSD for experimental SDPO
about 2 months ago
sft_video_llm.py
8.2 kB
Implement MCSD for experimental SDPO
about 2 months ago
sft_vlm.py
3.92 kB
Implement MCSD for experimental SDPO
about 2 months ago
sft_vlm_gemma3.py
6.69 kB
Implement MCSD for experimental SDPO
about 2 months ago
tiny_aya_chat_template.jinja
5.66 kB
Implement MCSD for experimental SDPO
about 2 months ago
xpo.py
4.5 kB
Implement MCSD for experimental SDPO
about 2 months ago