pepper12138's picture

24 2

pepper12138

Pepperhan

·

AI & ML interests

None yet

Organizations

upvoted a paper 5 months ago

A Survey on Diffusion Language Models

Paper • 2508.10875 • Published Aug 14, 2025 • 34

upvoted 16 papers 7 months ago

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Paper • 2506.08889 • Published Jun 10, 2025 • 23

VideoDeepResearch: Long Video Understanding With Agentic Tool Using

Paper • 2506.10821 • Published Jun 12, 2025 • 19

Inference-Time Hyper-Scaling with KV Cache Compression

Paper • 2506.05345 • Published Jun 5, 2025 • 27

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Paper • 2506.07986 • Published Jun 9, 2025 • 19

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Paper • 2506.05344 • Published Jun 5, 2025 • 16

Language-Image Alignment with Fixed Text Encoders

Paper • 2506.04209 • Published Jun 4, 2025 • 11

zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression

Paper • 2506.01084 • Published Jun 1, 2025 • 7

Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers

Paper • 2506.03065 • Published Jun 3, 2025 • 27

DLP: Dynamic Layerwise Pruning in Large Language Models

Paper • 2505.23807 • Published May 27, 2025 • 4

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

Paper • 2505.21541 • Published May 24, 2025 • 7

Rectified Sparse Attention

Paper • 2506.04108 • Published Jun 4, 2025 • 10

Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model

Paper • 2505.17561 • Published May 23, 2025 • 31

LaViDa: A Large Diffusion Language Model for Multimodal Understanding

Paper • 2505.16839 • Published May 22, 2025 • 13

Training-Free Efficient Video Generation via Dynamic Token Carving

Paper • 2505.16864 • Published May 22, 2025 • 24

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

Paper • 2505.16933 • Published May 22, 2025 • 34

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Paper • 2505.16175 • Published May 22, 2025 • 41

upvoted 3 papers 8 months ago

X-Fusion: Introducing New Modality to Frozen Large Language Models

Paper • 2504.20996 • Published Apr 29, 2025 • 13

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14, 2025 • 98

Visual Planning: Let's Think Only with Images

Paper • 2505.11409 • Published May 16, 2025 • 57