-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 15 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 58 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 41 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2512.19134
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 26
-
facebook/w2v-bert-2.0
Feature Extraction • 0.6B • Updated • 3.52M • 199 -
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 13k • 48 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 8.15M • 1.94k -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.17M • 1.44k
-
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
Paper • 2511.21678 • Published • 11 -
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation
Paper • 2512.19134 • Published • 31 -
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Paper • 2512.16969 • Published • 105 -
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
Paper • 2512.19535 • Published • 8
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 465 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 15 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 58 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 41 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7
-
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
Paper • 2511.21678 • Published • 11 -
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation
Paper • 2512.19134 • Published • 31 -
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Paper • 2512.16969 • Published • 105 -
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
Paper • 2512.19535 • Published • 8
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 26
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 465 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
facebook/w2v-bert-2.0
Feature Extraction • 0.6B • Updated • 3.52M • 199 -
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 13k • 48 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 8.15M • 1.94k -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.17M • 1.44k