Laguna-XS.2 โ Pre-compiled for AWS Neuron (trn2.3xlarge)
Pre-compiled and pre-sharded model artifacts for serving poolside/Laguna-XS.2 on AWS Trainium2 using NxD Inference.
Configuration
- Instance: trn2.3xlarge (LNC=2, 4 logical cores)
- TP degree: 4
- Batch size: 4 (TKG), 1 (CTE)
- Max sequence length: 4096
- Precision: BF16
- SDK: Neuron SDK 2.29 (neuronx-cc 2.24, NxDI 0.9.17334)
Files
| File | Size | Description |
|---|---|---|
| 4.3 GB | Compiled NEFFs (6 CTE + 6 TKG buckets) | |
| 12 KB | NxDI inference configuration | |
| 16 GB | Sharded weights for TP rank 0 | |
| 16 GB | Sharded weights for TP rank 1 | |
| 16 GB | Sharded weights for TP rank 2 | |
| 16 GB | Sharded weights for TP rank 3 |
Usage with vLLM
Performance
| Metric | Value |
|---|---|
| Throughput (BS=1) | ~50 tok/s (via vLLM) |
| Throughput (BS=4, raw) | 223 tok/s |
| Throughput (BS=8, raw) | 310 tok/s |
| TPOT (BS=1) | 11 ms |
Requirements
- AWS trn2.3xlarge instance
- Neuron SDK 2.29 (DLAMI 20260410)
- NxDI fork with Laguna contrib
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for jburtoft/Laguna-XS2-neuron-compiled
Base model
poolside/Laguna-XS.2