WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 16 days ago • 102
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills Paper • 2605.23899 • Published May 22 • 29
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published May 22 • 245
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark Paper • 2605.12501 • Published May 12 • 16
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 119
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation Paper • 2604.15309 • Published Apr 16 • 8
BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation Paper • 2603.25732 • Published Mar 26 • 11
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion Paper • 2512.04926 • Published Dec 4, 2025 • 42
Phi-Ground Tech Report: Advancing Perception in GUI Grounding Paper • 2507.23779 • Published Jul 31, 2025 • 46
ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL Paper • 2505.24875 • Published May 30, 2025 • 10
ReasonGen-R1 Collection Model and Datasets for the paper "ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL • 7 items • Updated Jun 2, 2025 • 6
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published Nov 7, 2024 • 39
LLM2CLIP Collection LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 10 items • Updated Mar 2 • 69
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation Paper • 2308.00906 • Published Aug 2, 2023 • 15