Post
3533
BAAI has released ROME🔥 evaluating 30+ large reasoning models on text & visual reasoning
FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions (2509.17177)
✨Tests visual reasoning, not just recognition
✨Covers capability × alignment × safety × efficiency
✨More transparent & reliable (less data contamination)
✨Helps make real-world deployment choices
FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions (2509.17177)
✨Tests visual reasoning, not just recognition
✨Covers capability × alignment × safety × efficiency
✨More transparent & reliable (less data contamination)
✨Helps make real-world deployment choices