Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching
Paper
•
2602.12221
•
Published
None defined yet.
Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching
PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation