Zikai Zhou's picture

Zikai Zhou

Klayand

·

https://klayand.github.io/

Klayand

AI & ML interests

Knowledge Distillation, Generated Models

Recent Activity

upvoted a paper 7 days ago

JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence

upvoted a paper 7 days ago

UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer

upvoted a paper 7 days ago

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

View all activity

Organizations

None yet

upvoted 3 papers 7 days ago

JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence

Paper • 2606.14777 • Published 13 days ago • 197

UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer

Paper • 2606.16255 • Published 8 days ago • 14

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Paper • 2606.17030 • Published 8 days ago • 28

upvoted 3 papers 12 days ago

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

Paper • 2606.09076 • Published 15 days ago • 61

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

Paper • 2606.11025 • Published 14 days ago • 41

Kwai Keye-VL-2.0 Technical Report

Paper • 2606.10651 • Published 14 days ago • 189

upvoted 2 papers 19 days ago

Cosmos 3: Omnimodal World Models for Physical AI

Paper • 2606.02800 • Published 22 days ago • 132

Qwen-Image-Flash: Beyond Objective Design

Paper • 2606.03746 • Published 21 days ago • 36

upvoted 2 papers 22 days ago

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

Paper • 2605.30409 • Published 26 days ago • 41

Representation Forcing for Bottleneck-Free Unified Multimodal Models

Paper • 2605.31604 • Published 25 days ago • 61

upvoted a paper 25 days ago

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 26 days ago • 146

upvoted 3 papers 29 days ago

Geo-Align: Video Generation Alignment via Metric Geometry Reward

Paper • 2605.23903 • Published May 22 • 10

StepAudio 2.5 Technical Report

Paper • 2605.23463 • Published May 22 • 49

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

Paper • 2605.21573 • Published May 20 • 111

upvoted 3 papers about 1 month ago

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

Paper • 2605.15178 • Published May 14 • 91

Qwen-Image-VAE-2.0 Technical Report

Paper • 2605.13565 • Published May 13 • 62

Qwen-Image-2.0 Technical Report

Paper • 2605.10730 • Published May 11 • 114

upvoted 3 papers about 2 months ago

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

Paper • 2605.06376 • Published May 7 • 27

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

Paper • 2605.05204 • Published May 6 • 28

Lightning Unified Video Editing via In-Context Sparse Attention

Paper • 2605.04569 • Published May 6 • 18