leondawn666 's Collections Multimodality
updated
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and
Future Frontiers
Paper
• 2506.23918
• Published
• 90
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper
• 2504.16030
• Published
• 36
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Paper
• 2505.24867
• Published
• 82
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
• 2507.01006
• Published
• 251
Scaling RL to Long Videos
Paper
• 2507.07966
• Published
• 160
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
Long-Term Memory
Paper
• 2508.09736
• Published
• 58
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual
Mathematical Reasoning
Paper
• 2508.10433
• Published
• 144
Thyme: Think Beyond Images
Paper
• 2508.11630
• Published
• 81
Paper
• 2508.10104
• Published
• 298
Paper
• 2508.11737
• Published
• 112
The Dragon Hatchling: The Missing Link between the Transformer and
Models of the Brain
Paper
• 2509.26507
• Published
• 547
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Paper
• 2511.15065
• Published
• 77
Thinking with Video: Video Generation as a Promising Multimodal
Reasoning Paradigm
Paper
• 2511.04570
• Published
• 240