Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases Paper • 2605.27355 • Published 12 days ago • 7