Text Generation
Transformers
Safetensors
Italian
English
qwen3
lora
fine-tuned
banking
regtech
compliance
rag
tool-calling
italian
conversational
text-generation-inference
Instructions to use Sophia-AI/RegTech-4B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Sophia-AI/RegTech-4B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Sophia-AI/RegTech-4B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Sophia-AI/RegTech-4B-Instruct") model = AutoModelForCausalLM.from_pretrained("Sophia-AI/RegTech-4B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Sophia-AI/RegTech-4B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Sophia-AI/RegTech-4B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sophia-AI/RegTech-4B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Sophia-AI/RegTech-4B-Instruct
- SGLang
How to use Sophia-AI/RegTech-4B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Sophia-AI/RegTech-4B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sophia-AI/RegTech-4B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Sophia-AI/RegTech-4B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sophia-AI/RegTech-4B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Sophia-AI/RegTech-4B-Instruct with Docker Model Runner:
docker model run hf.co/Sophia-AI/RegTech-4B-Instruct
File size: 10,594 Bytes
53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 b5dd752 53e6b54 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | ---
language:
- it
- en
license: apache-2.0
library_name: transformers
base_model: Qwen/Qwen3-4B-Instruct-2507
tags:
- lora
- fine-tuned
- banking
- regtech
- compliance
- rag
- tool-calling
- italian
- qwen3
pipeline_tag: text-generation
---
# RegTech-4B-Instruct
> **Fine-tuned for RAG-powered banking compliance — not general knowledge.**
A specialized [Qwen3-4B-Instruct](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) model fine-tuned to excel within a **Retrieval-Augmented Generation (RAG) pipeline** for Italian banking regulatory compliance.
This model doesn't try to memorize regulations — it's trained to **work with retrieved context**: follow instructions precisely, produce structured outputs, call compliance tools, resist hallucinations, and maintain professional tone when grounded on regulatory documents.
---
## What This Model Does
This fine-tuning optimizes the model's **behavior within a RAG system**, not its factual knowledge. Specifically:
| Task | Description |
|---|---|
| **RAG Q&A** | Answer regulatory questions grounded on retrieved documents |
| **Tool Calling** | KYC verification, risk scoring, PEP checks, SOS reporting |
| **Query Expansion** | Rewrite user queries with regulatory terminology for better retrieval |
| **Intent Detection** | Classify if a message needs document search or is conversational |
| **Document Reranking** | Score candidate documents by relevance |
| **Structured JSON** | Topic extraction, metadata, impact analysis in JSON format |
| **Impact Analysis** | Cross-reference external regulations against internal bank procedures |
| **Hallucination Resistance** | Refuse to fabricate regulations, articles, or sanctions not in context |
---
## Evaluation
### Methodology
We evaluate all fine-tuned models using a **dynamic adversarial benchmark** designed to prevent overfitting to static test sets:
- **Test generation**: An independent LLM generates novel, realistic test scenarios across 13 compliance-specific categories for each evaluation run. Tests are never reused.
- **Blind comparison**: Both the base and fine-tuned model respond to identical prompts. Responses are anonymized and randomly swapped before judging to eliminate position bias.
- **Expert judging**: A frontier-class LLM acts as domain expert judge, scoring each response on 7 criteria (accuracy, context adherence, hallucination resistance, format, tone, instruction following, completeness) on a 1–5 scale.
- **Statistical robustness**: Each evaluation consists of multiple independent loops with fresh test sets, ensuring results are consistent and not artifacts of a single test batch.
This approach produces a rigorous, reproducible assessment that closely mirrors real-world compliance assistant performance.
### Results — RegTech-4B-Instruct
Evaluated across **73 blind adversarial tests** over 3 independent loops.
#### Head-to-Head vs Base Model
```
Base Tuned
Win Rate (adj.) 45.2% 54.8%
Wins 26 33
Ties 14
```
#### Quality Scores (1–5 scale)
| Criterion | Base | Tuned | Delta | |
|---|:---:|:---:|:---:|---|
| Hallucination Resistance | 3.53 | **3.89** | +0.36 | Improved |
| Tone & Professionalism | 3.90 | **4.27** | +0.37 | Improved |
| Output Format | 3.41 | **3.75** | +0.34 | Improved |
| Instruction Following | 3.14 | **3.44** | +0.30 | Improved |
| Accuracy | 3.34 | **3.59** | +0.25 | Improved |
| Context Adherence | 3.66 | **3.89** | +0.23 | Improved |
| Completeness | **3.45** | 3.23 | -0.22 | Trade-off |
| **Overall** | **3.49** | **3.72** | **+0.23** | **Improved** |
#### Key Safety Improvements
The fine-tuned model demonstrates measurably safer behavior in high-stakes regulatory scenarios:
- **Hallucination traps**: The tuned model correctly refuses fabricated regulations in all tested scenarios. The base model invents plausible-sounding but entirely fictional legal articles and sanctions.
- **Credential protection**: When exposed to prompt injection attacks containing embedded credentials, the tuned model refuses disclosure. The base model has been observed leaking credentials verbatim.
- **Professional tone**: Eliminates emoji usage and filler phrases ("Certo!", "Ottima domanda!") that are inappropriate in regulatory communications.
#### Known Limitations
- **Completeness trade-off** (-0.22): The model tends toward concise, precise answers. For tasks requiring exhaustive analysis, responses may be shorter than ideal.
- **Query Expansion**: Performance on query rewriting tasks is below the base model. This is a known gap being addressed in dataset improvements.
- **Inference speed**: ~40% faster than base model (4.3s vs 7.0s average), primarily due to more concise outputs.
#### Consistency Across Loops
| Loop | Base Wins | Tuned Wins | Ties | Tuned % |
|:---:|:---:|:---:|:---:|:---:|
| 1 | 7 | 13 | 5 | 62.0% |
| 2 | 11 | 10 | 2 | 47.8% |
| 3 | 8 | 10 | 7 | 54.0% |
Tuned model wins or ties in 2 out of 3 independent loops.
---
## Usage Examples
### RAG Q&A — Answering from Retrieved Context
```python
messages = [
{
"role": "system",
"content": """Sei un assistente per la compliance bancaria.
Rispondi SOLO basandoti sul contesto fornito.
<contesto_recuperato>
Art. 92 CRR - Gli enti soddisfano in qualsiasi momento i seguenti
requisiti: a) CET1 del 4,5%; b) Tier 1 del 6%; c) capitale totale dell'8%.
</contesto_recuperato>"""
},
{
"role": "user",
"content": "Quali sono i requisiti minimi di capitale secondo il CRR?"
}
]
```
### Tool Calling — Compliance Workflows
```python
messages = [
{
"role": "system",
"content": """Sei un assistente operativo per la compliance.
<tools>
{"name": "calcola_scoring_rischio", "parameters": {...}}
{"name": "controlla_liste_pep", "parameters": {...}}
{"name": "verifica_kyc", "parameters": {...}}
</tools>
<contesto_recuperato>
Procedura AML-003: L'adeguata verifica rafforzata (EDD) deve essere
applicata per PEP, paesi ad alto rischio e profili con scoring > 60.
</contesto_recuperato>"""
},
{
"role": "user",
"content": "Devo aprire un conto per una società con sede a Dubai. Il legale rappresentante è il sig. Al-Rashid."
}
]
```
### Query Expansion — Improving RAG Retrieval
```python
messages = [
{
"role": "system",
"content": "Riscrivi la query dell'utente per migliorare il recupero documentale. Aggiungi termini tecnici e riferimenti normativi. Rispondi SOLO con il JSON."
},
{
"role": "user",
"content": "## QUERY ORIGINALE: [obblighi segnalazione operazioni sospette]"
}
]
```
### Document Reranking
```python
messages = [
{
"role": "system",
"content": "Valuta la rilevanza di ciascun candidato rispetto alla query. Score 0-100. Rispondi SOLO con il JSON."
},
{
"role": "user",
"content": '{"query": "requisiti CET1", "candidates": [{"id": "doc_001", "title": "Art. 92 CRR"}, {"id": "doc_002", "title": "DORA Art. 5"}]}'
}
]
```
### Training Metrics
| Metric | Value |
|---|---|
| Final Eval Loss | 1.368 |
| Token Accuracy | 70.5% |
| Train/Eval Gap | 0.033 |
> A gap of 0.033 indicates stable training with no overfitting. The model learned domain-specific behavior without degrading general capabilities.
### Design Principles
The LoRA configuration follows a **minimal intervention** philosophy validated through progressive experimentation across 6+ configurations:
- **Low rank, all modules**: Modifying all transformer layers with minimal rank produces better results than high rank on a subset of layers — consistent with findings from the [original LoRA paper](https://arxiv.org/abs/2106.09685).
- **Single epoch**: One pass through the data is sufficient for behavioral adaptation. Multiple epochs cause catastrophic forgetting on small models.
- **Conservative scaling**: Alpha = 2× rank with low learning rate ensures stable gradients with adequate signal amplification.
---
## Dataset Coverage
The training data covers the full lifecycle of a RAG-based compliance assistant:
| Category | Purpose |
|---|---|
| Query Expansion | Enrich queries with regulatory terms for better retrieval |
| Intent Classification | Route queries to RAG vs conversational responses |
| Document Reranking | Score retrieved documents by relevance |
| Topic Extraction | Extract main topics from regulatory text pages |
| Document Summarization | Summarize multi-page regulatory documents |
| Relevance Filtering | Filter regulatory text relevant to banks |
| Metadata Extraction | Find application dates, issuing authorities |
| Impact Analysis | Cross-reference regulations vs internal procedures |
| RAG Q&A + Tool Calling | Multi-turn compliance conversations with tools |
**Regulatory sources covered:** CRR/CRR3, DORA (UE 2022/2554), D.Lgs. 231/2007 (AML), D.Lgs. 385/1993 (TUB), Circolare 285, PSD2, MiFID II/MiFIR, D.P.R. 180/1950 and related Banca d'Italia provisions.
---
## Deployment
### With vLLM
```bash
vllm serve ./models/RegTech-4B-Instruct --dtype bfloat16
```
### With Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"YOUR_REPO_ID", torch_dtype="bfloat16", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("YOUR_REPO_ID")
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## Important Notes
- **RAG-optimized** — Trained to work with retrieved context, not to memorize regulations. Always provide relevant documents in the system prompt.
- **Domain-specific** — Optimized for Italian banking compliance. General capabilities may differ from the base model.
- **Not legal advice** — A tool to assist compliance professionals, not a substitute for regulatory expertise.
- **Part of a model family** — This 4B model is the lightweight variant. Larger models (7B, 14B, 32B) in the RegTech family offer progressively better completeness and accuracy for more demanding use cases.
---
<p align="center">
Built for banking RAG by <a href="https://landing.2sophia.ai">2Sophia</a><br>
<em>Fine-tuned with LoRA • Adversarial evaluation by frontier LLM judges • Powered by Qwen3</em>
</p> |