SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature Paper • 2601.10108 • Published Jan 15 • 7
WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts Paper • 2606.03220 • Published 5 days ago • 8
WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts Paper • 2606.03220 • Published 5 days ago • 8
WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts Paper • 2606.03220 • Published 5 days ago • 8
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code Paper • 2508.18106 • Published Aug 25, 2025 • 350