Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models Paper • 2601.01321 • Published 13 days ago • 17
SAMed-2: Selective Memory Enhanced Medical Segment Anything Model Paper • 2507.03698 • Published Jul 4, 2025 • 11
Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction Paper • 2506.14837 • Published Jun 15, 2025 • 10
Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction Paper • 2506.14837 • Published Jun 15, 2025 • 10
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning Paper • 2506.09736 • Published Jun 11, 2025 • 9
Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents Paper • 2505.23450 • Published May 29, 2025 • 9
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start Paper • 2505.22334 • Published May 28, 2025 • 36
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO Paper • 2505.22453 • Published May 28, 2025 • 46
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks Paper • 2505.16459 • Published May 22, 2025 • 45
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks Paper • 2505.16459 • Published May 22, 2025 • 45
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes Paper • 2504.11544 • Published Apr 15, 2025 • 44
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? Paper • 2504.06514 • Published Apr 9, 2025 • 39
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? Paper • 2504.06514 • Published Apr 9, 2025 • 39
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20, 2025 • 45
Open-Sora Plan: Open-Source Large Video Generation Model Paper • 2412.00131 • Published Nov 28, 2024 • 33
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 129
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 129
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination Paper • 2411.03823 • Published Nov 6, 2024 • 49