Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
RLAIF
Team
community
Activity Feed
Follow
20
AI & ML interests
None defined yet.
Recent Activity
AngelRaychev
updated
a dataset
1 day ago
RLAIF/webgpt
AngelRaychev
published
a dataset
1 day ago
RLAIF/webgpt
AngelRaychev
updated
a dataset
1 day ago
RLAIF/tldr
View all activity
Team members
9
RLAIF
's models
80
Sort: Recently updated
RLAIF/twitter_8EUB__5e-06_0.1_20_0.9_20_0.95
Updated
Oct 17
RLAIF/dpo_thinking_reddit_judge_last_minute_50_1e-6_0.02_4B_4B
Updated
Sep 25
RLAIF/dpo_thinking_reddit_judge_last_minute_150_1e-6_0.02_4B_4B
Updated
Sep 25
RLAIF/dpo_thinking_reddit_judge_last_minute_100_1e-6_0.02_4B_4B
Updated
Sep 25
RLAIF/dpo_thinking_reddit_judge_last_minute_200_1e-6_0.02_4B_4B
Updated
Sep 25
RLAIF/dpo_thinking_reddit_judge_last_minute_250_1e-6_0.02_4B_4B
Updated
Sep 25
RLAIF/grpo_reddit_judge_last_minute_16_64_8_3e-5_1e-6_4B
Updated
Sep 24
RLAIF/dpo_thinking_reddit_judge_full_1e-6_0.02_8B_4B
Updated
Sep 23
RLAIF/dpo_answer_reddit_judge_full_1e-6_0.02_4B_1.7B
Updated
Sep 23
RLAIF/dpo_answer_reddit_judge_full_1e-6_0.02_8B_4B
Updated
Sep 23
RLAIF/dpo_thinking_reddit_judge_position_bias_full_1e-6_0.02_4B_1.7B
Updated
Sep 23
RLAIF/dpo_thinking_reddit_judge_position_bias_full_1e-6_0.02_4B_4B
Updated
Sep 23
RLAIF/dpo_answer_reddit_judge_full_1e-6_0.02_4B_4B
Updated
Sep 23
RLAIF/dpo_answer_reddit_offtheshelf_extra_1e-6_0.02_4B_4B
Updated
Sep 23
RLAIF/dpo_thinking_reddit_offtheshelf_extra_1e-6_0.02_4B_4B
Updated
Sep 23
RLAIF/dpo_thinking_reddit_judge_position_bias_extra_1e-6_0.02_4B_4B
Updated
Sep 23
RLAIF/dpo_answer_reddit_judge_extra_1e-6_0.02_8B_4B
Updated
Sep 23
RLAIF/dpo_thinking_reddit_judge_position_bias_extra_1e-6_0.02_8B_4B
Updated
Sep 23
RLAIF/dpo_answer_reddit_judge_extra_1e-6_0.02_4B_4B
Updated
Sep 23
RLAIF/dpo_thinking_reddit_judge_position_bias_cot_extra_1e-6_0.02_4B_1.7B
Updated
Sep 23
RLAIF/dpo_answer_reddit_judge_extra_1e-6_0.02_4B_1.7B
Updated
Sep 23
RLAIF/dpo_thinking_reddit_judge_position_bias_extra_1e-6_0.02_1.7B_4B
Updated
Sep 23
RLAIF/dpo_answer_reddit_judge_extra_1e-6_0.02_1.7B_4B
Updated
Sep 23
RLAIF/dpo_thinking_reddit_judge_position_bias_1e-6_0.02_0.6B_4B
Updated
Sep 23
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_8B
Updated
Sep 23
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_0.6B_4B
Updated
Sep 23
RLAIF/dpo_answer_reddit_judge_extra_1e-6_0.02_4B_0.6B
Updated
Sep 22
RLAIF/dpo_thinking_reddit_judge_position_bias_extra_1e-6_0.02_4B_0.6B
Updated
Sep 22
RLAIF/dpo_thinking_reddit_judge_position_bias_1e-6_0.02_4B_14B
Updated
Sep 22
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_0.6B
Updated
Sep 22
Previous
1
2
3
Next