11 41 58

Tong Zhu

Spico

https://Spico197.github.io

AI & ML interests

Information Extraction, Mixture-of-Experts, LLM

Recent Activity

upvoted a paper about 15 hours ago

MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models

upvoted a paper about 15 hours ago

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

upvoted a paper about 19 hours ago

Toward Efficient Agents: Memory, Tool learning, and Planning

View all activity

Organizations

upvoted 2 papers about 15 hours ago

MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models

Paper • 2601.11969 • Published 5 days ago • 24

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

Paper • 2601.11655 • Published 6 days ago • 49

upvoted a paper about 19 hours ago

Toward Efficient Agents: Memory, Tool learning, and Planning

Paper • 2601.14192 • Published 1 day ago • 29

upvoted an article 7 days ago

Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Mar 20, 2024

•

109

authored 7 papers 19 days ago

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

Paper • 2411.15708 • Published Nov 24, 2024

Iterative Value Function Optimization for Guided Decoding

Paper • 2503.02368 • Published Mar 4, 2025 • 15

Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

Paper • 2503.05447 • Published Mar 7, 2025 • 8

Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models

Paper • 2503.16779 • Published Mar 21, 2025 • 1

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

Paper • 2406.11256 • Published Jun 17, 2024

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Paper • 2508.09834 • Published Aug 13, 2025 • 53

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published 23 days ago • 49

upvoted a paper 20 days ago

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published 23 days ago • 49

submitted a paper to Daily Papers 20 days ago

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published 23 days ago • 49

upvoted 2 papers about 2 months ago

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Paper • 2511.21689 • Published Nov 26, 2025 • 119

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 227

liked a Space 2 months ago

屎山文本墓园

📊

一个为文本项目建立“墓碑”的抽象装置，输入文本后将生成你的墓碑。

upvoted 2 papers 2 months ago

P1: Mastering Physics Olympiads with Reinforcement Learning

Paper • 2511.13612 • Published Nov 17, 2025 • 134

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Paper • 2511.13704 • Published Nov 17, 2025 • 42

upvoted an article 2 months ago

Article

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

Nov 3, 2025

•

upvoted a paper 3 months ago

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

Tong Zhu

AI & ML interests

Recent Activity

Organizations

Spico's activity

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

屎山文本墓园

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix