EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning Paper • 2601.03471 • Published 18 days ago • 7
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs Paper • 2511.14159 • Published Nov 18, 2025 • 25