Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers (https://arxiv.org/abs/2509.23152)
Zhicheng YANG
yangzhch6
AI & ML interests
reasoning with LLMs
Recent Activity
updated
a model about 2 hours ago
yangzhch6/maxrl-qwen3-4b-base-dapo-bs128-n16-stepp400 published
a model about 2 hours ago
yangzhch6/maxrl-qwen3-4b-base-dapo-bs128-n16-stepp400 upvoted a paper 4 days ago
ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning Organizations
None yet