Exp3-BehR: Reverted to Exp3 (0208) config, BehR-only. Exponential reward, KL on, lr=5e-6, temp=1.3. Dropped Cauchy. 300 steps, Qwen2.5-7B.
YOULING HUANG
Ricardo-H
·
AI & ML interests
None yet
Recent Activity
updated
a collection
about 4 hours ago
ws-wm-0221 updated
a collection
about 4 hours ago
ws-wm-0221 updated
a collection
about 4 hours ago
ws-wm-0221 Organizations
None yet