File size: 1,943 Bytes
a4ac241 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | ---
license: other
license_name: cohere-license
license_link: https://huggingface.co/CohereLabs/command-a-plus-05-2026
base_model: CohereLabs/command-a-plus-05-2026
tags:
- quantization
- int2
- int4
- mixture-of-experts
- command-a-plus
library_name: command-a-plus-lite
---
# Command-A-Plus-Lite (int2 experts / int4 resident)
Pre-quantized weights for running Cohere's **Command-A-Plus** (218B-parameter
Mixture-of-Experts, 25B active) on a **single 24GB GPU**.
| Component | Precision | Where |
|---|---|---|
| Routed experts (128/layer) | **int2**, group-wise (g=64) | CPU RAM, streamed per active expert |
| Attention q/k/v/o + shared experts + embedding | **int4**, group-wise (g=64) | GPU-resident |
| Router gate / layernorms | fp16 | GPU-resident |
```
weights on disk ~67 GB
resident VRAM ~8.4 GB
host RAM (pinned) ~61 GB (peaks ~108 GB during load)
decode speed ~0.3 tok/s (single 24GB GPU, --pin --gemlite)
```
Decode is **transfer-bound** (CPU→GPU expert streaming dominates), so this is a
capacity play — fitting a 218B model on one 24GB card — not a throughput one.
## Usage
Install the runtime: <https://github.com/kizuna-intelligence/Command-A-Plus-Lite>
```bash
pip install -e ".[gemlite]"
hf download kizuna-intelligence/Command-A-Plus-Lite --local-dir ./cmda_int4
```
```python
import torch
from command_a_plus_lite import load_quantized
model = load_quantized("./cmda_int4", device="cuda:0", dtype=torch.float16,
pin_experts=True, use_gemlite=True)
```
The tokenizer is **not** included here — use the one from the base model
[`CohereLabs/command-a-plus-05-2026`](https://huggingface.co/CohereLabs/command-a-plus-05-2026).
## License
The model weights are governed by **Cohere's license** for Command-A-Plus.
The runtime code is MIT (see the GitHub repository). int2 routed experts are
blind RTN (no calibration); quality is below the bf16 original.
|