Instructions to use Sophia-AI/RegTech-4B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sophia-AI/RegTech-4B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Sophia-AI/RegTech-4B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Sophia-AI/RegTech-4B-Instruct")
model = AutoModelForCausalLM.from_pretrained("Sophia-AI/RegTech-4B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Sophia-AI/RegTech-4B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sophia-AI/RegTech-4B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sophia-AI/RegTech-4B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Sophia-AI/RegTech-4B-Instruct

SGLang

How to use Sophia-AI/RegTech-4B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Sophia-AI/RegTech-4B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sophia-AI/RegTech-4B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Sophia-AI/RegTech-4B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sophia-AI/RegTech-4B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Sophia-AI/RegTech-4B-Instruct with Docker Model Runner:
```
docker model run hf.co/Sophia-AI/RegTech-4B-Instruct
```

RegTech-4B-Instruct

File size: 10,594 Bytes

---
language:
  - it
  - en
license: apache-2.0
library_name: transformers
base_model: Qwen/Qwen3-4B-Instruct-2507
tags:
  - lora
  - fine-tuned
  - banking
  - regtech
  - compliance
  - rag
  - tool-calling
  - italian
  - qwen3
pipeline_tag: text-generation
---

# RegTech-4B-Instruct

> **Fine-tuned for RAG-powered banking compliance — not general knowledge.**

A specialized [Qwen3-4B-Instruct](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) model fine-tuned to excel within a **Retrieval-Augmented Generation (RAG) pipeline** for Italian banking regulatory compliance.

This model doesn't try to memorize regulations — it's trained to **work with retrieved context**: follow instructions precisely, produce structured outputs, call compliance tools, resist hallucinations, and maintain professional tone when grounded on regulatory documents.

---

## What This Model Does

This fine-tuning optimizes the model's **behavior within a RAG system**, not its factual knowledge. Specifically:

| Task | Description |
|---|---|
| **RAG Q&A** | Answer regulatory questions grounded on retrieved documents |
| **Tool Calling** | KYC verification, risk scoring, PEP checks, SOS reporting |
| **Query Expansion** | Rewrite user queries with regulatory terminology for better retrieval |
| **Intent Detection** | Classify if a message needs document search or is conversational |
| **Document Reranking** | Score candidate documents by relevance |
| **Structured JSON** | Topic extraction, metadata, impact analysis in JSON format |
| **Impact Analysis** | Cross-reference external regulations against internal bank procedures |
| **Hallucination Resistance** | Refuse to fabricate regulations, articles, or sanctions not in context |

---

## Evaluation

### Methodology

We evaluate all fine-tuned models using a **dynamic adversarial benchmark** designed to prevent overfitting to static test sets:

- **Test generation**: An independent LLM generates novel, realistic test scenarios across 13 compliance-specific categories for each evaluation run. Tests are never reused.
- **Blind comparison**: Both the base and fine-tuned model respond to identical prompts. Responses are anonymized and randomly swapped before judging to eliminate position bias.
- **Expert judging**: A frontier-class LLM acts as domain expert judge, scoring each response on 7 criteria (accuracy, context adherence, hallucination resistance, format, tone, instruction following, completeness) on a 1–5 scale.
- **Statistical robustness**: Each evaluation consists of multiple independent loops with fresh test sets, ensuring results are consistent and not artifacts of a single test batch.

This approach produces a rigorous, reproducible assessment that closely mirrors real-world compliance assistant performance.

### Results — RegTech-4B-Instruct

Evaluated across **73 blind adversarial tests** over 3 independent loops.

#### Head-to-Head vs Base Model

```
                        Base    Tuned
Win Rate (adj.)        45.2%   54.8%
Wins                     26      33
Ties                          14
```

#### Quality Scores (1–5 scale)

| Criterion | Base | Tuned | Delta | |
|---|:---:|:---:|:---:|---|
| Hallucination Resistance | 3.53 | **3.89** | +0.36 | Improved |
| Tone & Professionalism | 3.90 | **4.27** | +0.37 | Improved |
| Output Format | 3.41 | **3.75** | +0.34 | Improved |
| Instruction Following | 3.14 | **3.44** | +0.30 | Improved |
| Accuracy | 3.34 | **3.59** | +0.25 | Improved |
| Context Adherence | 3.66 | **3.89** | +0.23 | Improved |
| Completeness | **3.45** | 3.23 | -0.22 | Trade-off |
| **Overall** | **3.49** | **3.72** | **+0.23** | **Improved** |

#### Key Safety Improvements

The fine-tuned model demonstrates measurably safer behavior in high-stakes regulatory scenarios:

- **Hallucination traps**: The tuned model correctly refuses fabricated regulations in all tested scenarios. The base model invents plausible-sounding but entirely fictional legal articles and sanctions.
- **Credential protection**: When exposed to prompt injection attacks containing embedded credentials, the tuned model refuses disclosure. The base model has been observed leaking credentials verbatim.
- **Professional tone**: Eliminates emoji usage and filler phrases ("Certo!", "Ottima domanda!") that are inappropriate in regulatory communications.

#### Known Limitations

- **Completeness trade-off** (-0.22): The model tends toward concise, precise answers. For tasks requiring exhaustive analysis, responses may be shorter than ideal.
- **Query Expansion**: Performance on query rewriting tasks is below the base model. This is a known gap being addressed in dataset improvements.
- **Inference speed**: ~40% faster than base model (4.3s vs 7.0s average), primarily due to more concise outputs.

#### Consistency Across Loops

| Loop | Base Wins | Tuned Wins | Ties | Tuned % |
|:---:|:---:|:---:|:---:|:---:|
| 1 | 7 | 13 | 5 | 62.0% |
| 2 | 11 | 10 | 2 | 47.8% |
| 3 | 8 | 10 | 7 | 54.0% |

Tuned model wins or ties in 2 out of 3 independent loops.

---

## Usage Examples

### RAG Q&A — Answering from Retrieved Context

```python
messages = [
    {
        "role": "system",
        "content": """Sei un assistente per la compliance bancaria. 
Rispondi SOLO basandoti sul contesto fornito.

<contesto_recuperato>
Art. 92 CRR - Gli enti soddisfano in qualsiasi momento i seguenti 
requisiti: a) CET1 del 4,5%; b) Tier 1 del 6%; c) capitale totale dell'8%.
</contesto_recuperato>"""
    },
    {
        "role": "user", 
        "content": "Quali sono i requisiti minimi di capitale secondo il CRR?"
    }
]
```

### Tool Calling — Compliance Workflows

```python
messages = [
    {
        "role": "system",
        "content": """Sei un assistente operativo per la compliance.
        
<tools>
{"name": "calcola_scoring_rischio", "parameters": {...}}
{"name": "controlla_liste_pep", "parameters": {...}}
{"name": "verifica_kyc", "parameters": {...}}
</tools>

<contesto_recuperato>
Procedura AML-003: L'adeguata verifica rafforzata (EDD) deve essere 
applicata per PEP, paesi ad alto rischio e profili con scoring > 60.
</contesto_recuperato>"""
    },
    {
        "role": "user",
        "content": "Devo aprire un conto per una società con sede a Dubai. Il legale rappresentante è il sig. Al-Rashid."
    }
]
```

### Query Expansion — Improving RAG Retrieval

```python
messages = [
    {
        "role": "system",
        "content": "Riscrivi la query dell'utente per migliorare il recupero documentale. Aggiungi termini tecnici e riferimenti normativi. Rispondi SOLO con il JSON."
    },
    {
        "role": "user",
        "content": "## QUERY ORIGINALE: [obblighi segnalazione operazioni sospette]"
    }
]
```

### Document Reranking

```python
messages = [
    {
        "role": "system",
        "content": "Valuta la rilevanza di ciascun candidato rispetto alla query. Score 0-100. Rispondi SOLO con il JSON."
    },
    {
        "role": "user",
        "content": '{"query": "requisiti CET1", "candidates": [{"id": "doc_001", "title": "Art. 92 CRR"}, {"id": "doc_002", "title": "DORA Art. 5"}]}'
    }
]
```

### Training Metrics

| Metric | Value |
|---|---|
| Final Eval Loss | 1.368 |
| Token Accuracy | 70.5% |
| Train/Eval Gap | 0.033 |

> A gap of 0.033 indicates stable training with no overfitting. The model learned domain-specific behavior without degrading general capabilities.

### Design Principles

The LoRA configuration follows a **minimal intervention** philosophy validated through progressive experimentation across 6+ configurations:

- **Low rank, all modules**: Modifying all transformer layers with minimal rank produces better results than high rank on a subset of layers — consistent with findings from the [original LoRA paper](https://arxiv.org/abs/2106.09685).
- **Single epoch**: One pass through the data is sufficient for behavioral adaptation. Multiple epochs cause catastrophic forgetting on small models.
- **Conservative scaling**: Alpha = 2× rank with low learning rate ensures stable gradients with adequate signal amplification.

---

## Dataset Coverage

The training data covers the full lifecycle of a RAG-based compliance assistant:

| Category | Purpose |
|---|---|
| Query Expansion | Enrich queries with regulatory terms for better retrieval |
| Intent Classification | Route queries to RAG vs conversational responses |
| Document Reranking | Score retrieved documents by relevance |
| Topic Extraction | Extract main topics from regulatory text pages |
| Document Summarization | Summarize multi-page regulatory documents |
| Relevance Filtering | Filter regulatory text relevant to banks |
| Metadata Extraction | Find application dates, issuing authorities |
| Impact Analysis | Cross-reference regulations vs internal procedures |
| RAG Q&A + Tool Calling | Multi-turn compliance conversations with tools |

**Regulatory sources covered:** CRR/CRR3, DORA (UE 2022/2554), D.Lgs. 231/2007 (AML), D.Lgs. 385/1993 (TUB), Circolare 285, PSD2, MiFID II/MiFIR, D.P.R. 180/1950 and related Banca d'Italia provisions.

---

## Deployment

### With vLLM
```bash
vllm serve ./models/RegTech-4B-Instruct --dtype bfloat16
```

### With Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "YOUR_REPO_ID", torch_dtype="bfloat16", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("YOUR_REPO_ID")

text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## Important Notes

- **RAG-optimized** — Trained to work with retrieved context, not to memorize regulations. Always provide relevant documents in the system prompt.
- **Domain-specific** — Optimized for Italian banking compliance. General capabilities may differ from the base model.
- **Not legal advice** — A tool to assist compliance professionals, not a substitute for regulatory expertise.
- **Part of a model family** — This 4B model is the lightweight variant. Larger models (7B, 14B, 32B) in the RegTech family offer progressively better completeness and accuracy for more demanding use cases.

---

<p align="center">
  Built for banking RAG by <a href="https://landing.2sophia.ai">2Sophia</a><br>
  <em>Fine-tuned with LoRA &bull; Adversarial evaluation by frontier LLM judges &bull; Powered by Qwen3</em>
</p>