From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Paranioar
AI & ML interests
Vision-and-Language, Parameter-efficient Transfer Learning, Multi-modal Large Language Model
Recent Activity
upvoted a paper about 2 hours ago
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer upvoted a paper about 2 hours ago
MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction upvoted a paper 2 days ago
Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation