Reward Models Inherit Value Biases from Pretraining ICLR2026 Collection Reward models and logprobs for the paper Christian et al., "Reward Models Inherit Value Biases from Pretraining" (ICLR 2026) • 24 items • Updated Feb 23
Reward Models Inherit Value Biases from Pretraining ICLR2026 Collection Reward models and logprobs for the paper Christian et al., "Reward Models Inherit Value Biases from Pretraining" (ICLR 2026) • 24 items • Updated Feb 23