Potential9D - a ali0001010010 Collection

ali0001010010 's Collections

Realtime Voice Calling stuff

Agentic / LLm stuff

Potential9D

updated about 9 hours ago

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Paper • 2602.20161 • Published Feb 23 • 23
A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 523
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125
AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Paper • 2604.04184 • Published Apr 5 • 50
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published Apr 6 • 203
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Paper • 2604.04921 • Published Apr 6 • 113
Memory Intelligence Agent

Paper • 2604.04503 • Published Apr 6 • 58
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Paper • 2604.10905 • Published 30 days ago • 28
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Paper • 2604.14268 • Published 28 days ago • 118
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

Paper • 2604.22446 • Published 19 days ago • 121
X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction

Paper • 2605.05765 • Published 6 days ago • 17