GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics Paper • 2602.12617 • Published 6 days ago • 20
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models Paper • 2508.01548 • Published Aug 3, 2025 • 14 • 2
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models Paper • 2508.01548 • Published Aug 3, 2025 • 14
Towards RAW Object Detection in Diverse Conditions Paper • 2411.15678 • Published Nov 24, 2024 • 1
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness Paper • 2501.07978 • Published Jan 14, 2025 • 1
Gaussian Splatting with Discretized SDF for Relightable Assets Paper • 2507.15629 • Published Jul 21, 2025 • 23
Gaussian Splatting with Discretized SDF for Relightable Assets Paper • 2507.15629 • Published Jul 21, 2025 • 23 • 1
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1, 2025 • 251
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs Paper • 2506.21862 • Published Jun 27, 2025 • 36
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Paper • 2506.21277 • Published Jun 26, 2025 • 14
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding Paper • 2501.15111 • Published Jan 25, 2025 • 1
Towards RAW Object Detection in Diverse Conditions Paper • 2411.15678 • Published Nov 24, 2024 • 1