Block Diffusion for Flash Speculative Decoding
AI & ML interests
Efficient AI
Recent Activity
Papers
DFlash: Block Diffusion for Flash Speculative Decoding
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
-
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Paper • 2511.10645 • Published • 7 -
z-lab/Qwen3.5-4B-PARO
Image-Text-to-Text • 1B • Updated • 850 • 12 -
z-lab/Qwen3.5-9B-PARO
Image-Text-to-Text • 3B • Updated • 1.22k • 31 -
z-lab/Qwen3.5-2B-PARO
Image-Text-to-Text • 1B • Updated • 153 • 2
Block Diffusion for Flash Speculative Decoding
Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
-
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Paper • 2511.10645 • Published • 7 -
z-lab/Qwen3.5-4B-PARO
Image-Text-to-Text • 1B • Updated • 850 • 12 -
z-lab/Qwen3.5-9B-PARO
Image-Text-to-Text • 3B • Updated • 1.22k • 31 -
z-lab/Qwen3.5-2B-PARO
Image-Text-to-Text • 1B • Updated • 153 • 2
Accelerating LLM Fine-Tuning with Contextual Sparsity