Acknowledge license to accept the repository
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
By downloading, accessing, or using the model in the ways indicated, you fully accept the EngGPT Non-Commercial License and the related Acceptable Use Policy (AUP). If you do not agree to the terms and conditions set out therein, you must not download or use the model and must delete any copies already obtained. Any use of the model for commercial purposes is expressly prohibited.
Log in or Sign Up to review the conditions and access this model content.
Model Overview
EngGPT2-16B-A3B is the latest iteration of Engineering Ingegneria Informatica S.p.A.’s Italian LLM and it’s built to be a Sovereign, Efficient and Open model. EngGPT2 is trained on 2.5 trillion tokens - less than Qwen3’s 36T or Llama3’s 15T - and delivers performance on key benchmarks, including MMLU-Pro, GSM8K, IFEval and HumanEval, comparable to dense models in the 8B–16B range, while requiring one-fifth to half of the inference power, and between one-tenth to one-sixth of the training data and consequent needed training power. Designed as a trained-from-scratch Mixture-of-Experts (MoE) architecture, EngGPT2 features 16 billion parameters with 3 billion active per inference, with expert sizes positioned between those used in GPT-OSS and Qwen3. Approximately 25% of its training corpus consists of Italian-language data, to deliver strong capabilities for European and Italian NLP tasks among models of similar scale. This efficiency aims to position EngGPT2 as a key contributor to the growing portfolio of open-weight European models, combining performance and efficiency with full alignment to the EU AI Act. EngGPT2 is also a single model capable of multiple reasoning modes: non-reasoning, reasoning in Italian or English, and turbo-reasoning (a concise, bullet-point style reasoning available in both languages designed for real-time reasoning use cases). EngGPT2 aims to set a new standard for resource-conscious, high-performance LLMs tailored to European and Italian contexts.
Model Details
We introduce EngGPT2.0‑16B‑A3B, a highly sparse, medium‑scale Mixture of Experts (MoE) language model designed as a flexible and efficient foundation for research and advanced downstream applications. Engineered to balance computational efficiency with strong general‑purpose reasoning, EngGPT2‑16B‑A3B supports a wide spectrum of capabilities, including structured reasoning, domain adaptation, tool‑augmented workflows, controlled generation, and general conversational proficiency. The EngGPT2‑16B‑A3B release provides full insight into its development pipeline—covering architectural decisions, data preparation, training dynamics, and evaluation methodology across the entire lifecycle of the model. The training process is structured around three distinct but interconnected phases, each with specific objectives and methodological approaches:
- Pre-training: Language Foundation
- Mid-Training: Consolidation and Capability Enhancement
- Post-Training: Alignment and Instruction Following
Model Info
- Model Date: March, 30 , 2026
- Model developer organization: Engineering Ingegneria Informatica S.p.A.
- Model Langueges: English, Italian, French, German, Portuguese and Spanish
- Model Type: EngGPT2 is based on a Mixture-of-Experts (MoE) transformer architecture, this model has ~16 B total parameters with ~3 B activated per inference, using sparse expert routing to balance capability and efficiency
- License: Proprietary
Model Architecture
EngGPT2 employs a Mixture‑of‑Experts (MoE) transformer architecture with 24 layers. Each layer includes 64 experts, of which 8 are activated per token via dynamic routing. Because parameters are not shared among experts, each expert can develop stronger specialization. The attention mechanism uses Grouped Query Attention (GQA), and every layer integrates SwiGLU activations, Rotary Positional Embeddings (RoPE), and RMSNorm with pre‑normalization to support stable training and rich representation learning. Each expert is configured with a hidden size of 2880 and an MoE intermediate dimension of 1080, yielding roughly 3B active parameters per forward pass. This design provides ample capacity for high‑quality reasoning while keeping the overall model size within our training budget constraints.
Quickstart
To run the model using the Transformers library, make sure you install version 4.57.0, which is required for compatibility with the custom Mixture‑of‑Experts architecture. Since the model directory includes the Python modules defining the architecture, you must enable trust_remote_code=True when loading both the tokenizer and the model. This allows Transformers to correctly import and use the custom implementation shipped with the model.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("engineering-group/EngGPT2-16B-A3B")
tokenizer.model_input_names = ["input_ids", "attention_mask"]
model = AutoModelForCausalLM.from_pretrained("engineering-group/EngGPT2-16B-A3B", trust_remote_code=True)
messages = [
{"role": "user", "content": "Ciao, chi sei?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=400)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Reasoning Behaviour
To control the model’s reasoning behavior, you can configure several options within the chat template.
Setting enable_thinking=False forces the model to avoid any reasoning steps.
If you want to explicitly enable reasoning in Italian, use reasoning_lang="ita"; otherwise, the default English reasoning can be activated with reasoning_lang="en".
Moreover, you can also enable the turbo reasoning mode (compressed reasoning modality) by specifying enable_turbo=True.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("engineering-group/EngGPT2-16B-A3B")
tokenizer.model_input_names = ["input_ids", "attention_mask"]
model = AutoModelForCausalLM.from_pretrained("engineering-group/EngGPT2-16B-A3B", trust_remote_code=True)
messages = [
{"role": "user", "content": "Ciao, chi sei?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
enable_thinking=False,
#reasoning_lang="ita",
#enable_turbo=True,
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=400)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Evaluation
The model is assessed using a broad benchmark suite covering:
- General knowledge: MMLU, MMLU-Pro, MMLU-Redux
- Instruction following: IFEval
- Code generation: HumanEval
- Mathematical reasoning: AIME (2025-2026), GSM8K
- Italian performance: ARC‑IT, MMLU‑IT
The model shows competitive performance relative to 7B-class dense models and is close to stronger 14–32B models, despite being an MoE architecture with a smaller active parameter footprint.
| Benchmark | Score |
|---|---|
| MMLU-Pro | 57.3 |
| MMLU-Redux | 75.5 |
| IFEval | 72.0 |
| HumanEval | 64.0 |
| AIME25 (pass@8) | 60.0 |
| AIME26 (pass@8) | 70.0 |
| GSM8K | 88.0 |
| BFCL | 48.5 |
| ARC-IT | 85.6 |
| MMLU-IT | 65.5 |
Citation
If you use this model or build upon this work, please cite the following paper:
@misc{ciarfaglia2026enggpt2sovereignefficientopen,
title={EngGPT2: Sovereign, Efficient and Open Intelligence},
author={G. Ciarfaglia and A. Rosanova and S. Cipolla and J. Bartoli and A. Di Domenico and C. Fioroni and A. Fontana and M. R. Scoleri and M. I. Mone and D. Franchi and M. C. Del Gaudio and A. Leodori and F. Cinti and M. Capozzi and C. Baston and F. Picariello and M. Gabusi and S. Bonura and V. Morreale and I. Bailo},
year={2026},
eprint={2603.16430},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.16430},
}
- Downloads last month
- 105
