3 5

Zhanchao Zhou

Zcchill

AI & ML interests

None yet

Recent Activity

upvoted a paper 14 days ago

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

authored a paper about 1 month ago

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

upvoted a paper about 1 month ago

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

View all activity

Organizations

None yet

upvoted a paper 14 days ago

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Paper • 2512.15745 • Published 23 days ago • 77

authored a paper about 1 month ago

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

Paper • 2511.23319 • Published Nov 28, 2025 • 22

upvoted a paper about 1 month ago

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

Paper • 2511.23319 • Published Nov 28, 2025 • 22

commented a paper about 1 month ago

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

Paper • 2511.23319 • Published Nov 28, 2025 • 22 •

authored a paper 2 months ago

Knocking-Heads Attention

Paper • 2510.23052 • Published Oct 27, 2025 • 29

upvoted a paper 2 months ago

Knocking-Heads Attention

Paper • 2510.23052 • Published Oct 27, 2025 • 29

commented a paper 2 months ago

Knocking-Heads Attention

Paper • 2510.23052 • Published Oct 27, 2025 • 29 •

authored 2 papers 5 months ago

QUBE: Enhancing Automatic Heuristic Design via Quality-Uncertainty Balanced Evolution

Paper • 2412.20694 • Published Dec 30, 2024

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

Paper • 2508.07785 • Published Aug 11, 2025 • 28

upvoted 2 papers 5 months ago

MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs

Paper • 2508.05257 • Published Aug 7, 2025 • 13

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

Paper • 2508.07785 • Published Aug 11, 2025 • 28

commented a paper about 1 year ago

Value Residual Learning For Alleviating Attention Concentration In Transformers

Paper • 2410.17897 • Published Oct 23, 2024 • 9 •

authored 2 papers about 1 year ago

Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace

Paper • 2310.19651 • Published Oct 30, 2023

Value Residual Learning For Alleviating Attention Concentration In Transformers

Paper • 2410.17897 • Published Oct 23, 2024 • 9

Zhanchao Zhou

AI & ML interests

Recent Activity

Organizations

Zcchill's activity