Laguna M.1-FP8

Laguna M.1-FP8 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work. This is the FP8-quantized variant of Laguna M.1.

This is the FP8 variant. The BF16 and NVFP4 variants are also available on Hugging Face.

Highlights

Large sparse MoE for agentic coding: Laguna M.1 is a 70-layer MoE transformer with 225B total parameters and 23B activated parameters per token
High-capacity expert routing: After 3 dense SwiGLU layers, Laguna M.1 uses 67 sparse MoE layers with 256 experts, top-k=16 routing and auxiliary-loss-free load balancing
Global attention architecture: Laguna M.1 uses global attention across all layers with 64 Q-heads, 8 KV-heads and softplus attention output gating
Native reasoning support: Interleaved thinking between tool calls with support for enabling and disabling thinking per-request
Apache 2.0 license: Use and modify freely for commercial and non-commercial purposes

Model overview

Training: pre-training, post-training and reinforcement learning stages
Number of parameters: 225B total with 23B activated per token
Optimizer: Muon
Layers: 70 layers with global attention
Experts: 256 experts with 1 shared expert; top-k=16 routing
Dense layers: first 3 layers are dense SwiGLU; remaining 67 layers are sparse MoE
Attention: 64 Q-heads, 8 KV-heads, head dimension 128, with softplus attention output gating
Positional encoding: RoPE with YaRN
Modality: text-to-text
Context window: 262,144 tokens
Reasoning support: interleaved thinking with preserved thinking
Quantization: FP8 (weights), detected automatically from quantization_config

Benchmark results

benchmarks

Model	Parameters	SWE-bench Verified	SWE-bench Multilingual	SWE-bench Pro (Public Dataset)	Terminal-Bench 2.0
Laguna M.1 (BF16)	225B-A23B	74.6%	63.1%	49.2%	45.8%
Devstral 2	123B dense	72.2%	61.3%	-	32.6%
GLM-4.7	355B-A32B	73.8%	66.7%	-	41.0%
DeepSeek-V4 Flash	284B-A13B	79.0%	73.3%	52.6%	56.9%
Qwen3.5-397B-A17B	397B-A17B	76.2%	69.3%	50.9%	52.5%
Claude Sonnet 4.6	-	79.6%	-	-	59.1%

Scores shown are for the BF16 reference model; see the main Laguna M.1 model card for full benchmarking methodology. We used the highest publicly-referenced scores for all comparison models across each benchmark.

Usage

Laguna M.1 has upstream support in vLLM, SGLang, and TRT-LLM thanks to the support of the team at NVIDIA.

For complete usage instructions, see the main Laguna M.1 model card.

Deployment

vLLM

The full vLLM recipe is on the main Laguna M.1 model card. Quantization is detected automatically from quantization_config in this checkpoint, so the same command works with poolside/Laguna-M.1-FP8 substituted for the model ID. No extra flags required.

pip install 'vllm>=0.21.0'

vllm serve \
    --model poolside/Laguna-M.1-FP8 \
    --tool-call-parser poolside_v1 \
    --reasoning-parser poolside_v1 \
    --enable-auto-tool-choice \
    --served-model-name laguna \
    --default-chat-template-kwargs '{"enable_thinking": true}'

SGLang

Laguna M.1 is supported in SGLang via sgl-project/sglang#28400. Quantization is detected automatically from quantization_config, so no extra flags are required. A full serving recipe will be added to the main Laguna M.1 model card.

TRT-LLM

Laguna is supported in TensorRT-LLM thanks to the team at NVIDIA (NVIDIA/TensorRT-LLM#13559, with partial-RoPE fusion in #15110). The full recipe is on the main Laguna M.1 model card. Quantization is detected automatically from quantization_config in this checkpoint, so no extra flags are required.

Controlling reasoning

Laguna M.1 has native reasoning support and is designed to work best with preserved thinking, where reasoning content from prior assistant messages is preserved in the message history. This model will generally reason before calling tools and between tool calls. See the main Laguna M.1 model card for streaming, tool-call, and preserved-thinking examples.

Disabling reasoning

You can disable thinking by setting enable_thinking to False in a request or by not providing --default-chat-template-kwargs {"enable_thinking": True} or equivalent when starting the server.

License

This model is licensed under the Apache 2.0 License.

Intended and Responsible Use

Laguna M.1 is designed for software engineering and agentic coding use cases, and you are responsible for confirming that it is appropriate for your intended application. Laguna M.1 is subject to the Apache 2.0 License, and should be used consistently with Poolside's Acceptable Use Policy. We advise against circumventing Laguna M.1 safety guardrails without implementing substantially equivalent mitigations appropriate for your use case.

Please report security vulnerabilities or safety concerns to security@poolside.ai.

Downloads last month: 281

Safetensors

Model size

226B params

Tensor type

F32

BF16

F8_E4M3

Model tree for poolside/Laguna-M.1-FP8

Base model

poolside/Laguna-M.1

Quantized

(10)

this model

Collection including poolside/Laguna-M.1-FP8

Laguna M.1

Collection

Our most capable model to date, designed for long-horizon work. Apache 2.0. • 4 items • Updated about 18 hours ago • 10