Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge Paper • 2312.05693 • Published Dec 9, 2023 • 1
EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge Paper • 2402.10787 • Published Feb 16, 2024
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model Paper • 2211.11152 • Published Nov 21, 2022
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference Paper • 2411.01171 • Published Nov 2, 2024
RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation Paper • 2501.04315 • Published Jan 8
TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools Paper • 2503.10970 • Published Mar 14 • 18