| title: LLM KV Cache Calculator | |
| emoji: 💻 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.45.0 | |
| app_file: app.py | |
| pinned: false | |
| short_description: Calculate KV cache memory requirements for LLMs | |
| # KV Cache Calculator | |
| Calculate KV cache memory requirements for transformer models. | |
| ## Credits | |
| This implementation is derived from and builds upon the excellent work by [gaunernst](https://huggingface.co/spaces/gaunernst/kv-cache-calculator). Special thanks for the original implementation! | |
| ## Features | |
| - **Multi-attention support**: MHA (Multi-Head Attention), GQA (Grouped Query Attention), and MLA (Multi-head Latent Attention) | |
| - **Multiple data types**: fp16/bf16, fp8, and fp4 quantization | |
| - **Real-time calculation**: Instant memory requirement estimates | |
| - **Model analysis**: Detailed breakdown of model configuration | |
| - **Universal compatibility**: Works with any HuggingFace transformer model | |
| ## Usage | |
| 1. Enter your model ID (e.g., "Qwen/Qwen3-30B-A3B") | |
| 2. Set context length and number of users | |
| 3. Choose data type precision | |
| 4. Add HuggingFace token if needed for gated models | |
| 5. Click calculate to get memory requirements | |