Instructions to use Sophia-AI/RegTech-14B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sophia-AI/RegTech-14B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Sophia-AI/RegTech-14B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Sophia-AI/RegTech-14B-Instruct")
model = AutoModelForCausalLM.from_pretrained("Sophia-AI/RegTech-14B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Sophia-AI/RegTech-14B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sophia-AI/RegTech-14B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sophia-AI/RegTech-14B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Sophia-AI/RegTech-14B-Instruct

SGLang

How to use Sophia-AI/RegTech-14B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Sophia-AI/RegTech-14B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sophia-AI/RegTech-14B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Sophia-AI/RegTech-14B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sophia-AI/RegTech-14B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Sophia-AI/RegTech-14B-Instruct with Docker Model Runner:
```
docker model run hf.co/Sophia-AI/RegTech-14B-Instruct
```

RegTech-14B-Instruct / README.md

MwSpace

Create README.md

f57c0d6 verified 4 months ago

preview code

raw

history blame contribute delete

9.74 kB

	---
	language:
	- it
	- en
	license: apache-2.0
	library_name: transformers
	base_model: Qwen/Qwen2.5-14B-Instruct
	tags:
	- lora
	- fine-tuned
	- banking
	- regtech
	- compliance
	- rag
	- tool-calling
	- italian
	- qwen2.5
	pipeline_tag: text-generation
	---

	# 🏦 RegTech-14B-Instruct

	> Fine-tuned for RAG-powered banking compliance — not general knowledge.

	A specialized [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) model fine-tuned to excel within a Retrieval-Augmented Generation (RAG) pipeline for Italian banking regulatory compliance.

	This model doesn't try to memorize regulations — it's trained to work with retrieved context: follow instructions precisely, produce structured outputs, call compliance tools, and maintain the right tone and terminology when grounded on regulatory documents.

	---

	## 🎯 What This Model Does

	This fine-tuning optimizes the model's behavior within a RAG system, not its factual knowledge. Specifically:

	\| Task \| Description \|
	\|---\|---\|
	\| 📋 RAG Q&A \| Answer regulatory questions grounded on retrieved documents \|
	\| 🔧 Tool Calling \| KYC verification, risk scoring, PEP checks, SOS reporting \|
	\| 🔍 Query Expansion \| Rewrite user queries with regulatory terminology for better retrieval \|
	\| 🧠 Intent Detection \| Classify if a message needs document search or is conversational \|
	\| 📊 Document Reranking \| Score candidate documents by relevance \|
	\| 📝 Structured JSON \| Topic extraction, metadata, impact analysis in JSON format \|
	\| ⚖️ Impact Analysis \| Cross-reference external regulations against internal bank procedures \|

	---

	## 📈 Evaluation — LLM-as-Judge

	Evaluated by Claude Opus 4.6 (Anthropic) across 11 blind test scenarios. The judge compared base vs fine-tuned model outputs without knowing which was which.

	### 🏆 Head-to-Head

	```
	┌─────────────────────────────────────────┐
	│ 🟢 Tuned Wins 8/11 (77.3%) │
	│ 🔴 Base Wins 2/11 (22.7%) │
	│ ⚪ Ties 1/11 │
	└─────────────────────────────────────────┘
	```

	### 📊 Quality Scores (1–5)

	\| Criterion \| Base \| Tuned \| Delta \| \|
	\|---\|:---:\|:---:\|:---:\|---\|
	\| 🎯 Instruction Following \| 3.55 \| 4.64 \| +1.09 \| 🟢🟢🟢 \|
	\| 📎 Context Adherence \| 3.82 \| 4.82 \| +1.00 \| 🟢🟢 \|
	\| ✅ Accuracy \| 4.00 \| 4.73 \| +0.73 \| 🟢🟢 \|
	\| 📐 Format \| 4.18 \| 4.45 \| +0.27 \| 🟢 \|
	\| 🗣️ Tone \| 4.73 \| 4.82 \| +0.09 \| ➖ \|
	\| 📊 Overall \| 4.06 \| 4.69 \| +0.64 \| 🟢🟢 \|

	> Highest win rate across all model sizes at 77.3%. Instruction following jumps +1.09 and context adherence +1.00 — the fine-tuning dramatically improves the model's ability to stay grounded on retrieved regulatory context.

	### 📂 Results by Category

	\| Category \| Base \| Tuned \| Tie \|
	\|---\|:---:\|:---:\|:---:\|
	\| 📖 RAG Q&A \| 0 \| 2 \| 0 \|
	\| 🚫 Refusal Handling \| 0 \| 2 \| 0 \|
	\| ⚠️ Edge Cases \| 0 \| 1 \| 0 \|
	\| 🎨 Style & Tone \| 0 \| 1 \| 0 \|
	\| 📤 Data Extraction \| 0 \| 0 \| 1 \|
	\| 📋 JSON Output \| 1 \| 1 \| 0 \|
	\| 🔧 Tool Use \| 1 \| 1 \| 0 \|

	### 🔄 Comparison Across Model Sizes

	\| Metric \| 4B \| 7B \| 14B \| 32B \|
	\|---\|:---:\|:---:\|:---:\|:---:\|
	\| Base score (pre-tuning) \| 4.11 \| 3.84 \| 4.06 \| 4.36 \|
	\| Tuned score \| 4.68 \| 4.78 \| 4.69 \| 4.80 \|
	\| Delta (improvement) \| +0.57 \| +0.95 \| +0.64 \| +0.44 \|
	\| Win rate \| 68.2% \| 68.2% \| 77.3% \| 68.2% \|
	\| Best eval loss \| 1.191 \| 1.330 \| 1.225 \| 0.813 \|
	\| Token accuracy \| ~73% \| ~72% \| ~72% \| ~81% \|

	---

	## 💡 Usage Examples

	### 📋 RAG Q&A — Answering from Retrieved Context

	The model is designed to receive retrieved regulatory documents as context and answer based on them:

	```python
	messages = [
	{
	"role": "system",
	"content": """Sei un assistente per la compliance bancaria.
	Rispondi SOLO basandoti sul contesto fornito.

	<contesto_recuperato>
	Art. 92 CRR - Gli enti soddisfano in qualsiasi momento i seguenti
	requisiti: a) CET1 del 4,5%; b) Tier 1 del 6%; c) capitale totale dell'8%.
	Il coefficiente è calcolato come rapporto tra i fondi propri e
	l'importo complessivo dell'esposizione al rischio.
	</contesto_recuperato>"""
	},
	{
	"role": "user",
	"content": "Quali sono i requisiti minimi di capitale secondo il CRR?"
	}
	]
	```

	### 🔍 Query Expansion — Improving RAG Retrieval

	```python
	messages = [
	{
	"role": "system",
	"content": "Riscrivi la query dell'utente in una versione più ricca per migliorare il recupero documentale (RAG). Aggiungi termini tecnici e riferimenti normativi. Rispondi SOLO con il JSON richiesto."
	},
	{
	"role": "user",
	"content": "## QUERY ORIGINALE: [obblighi segnalazione operazioni sospette]"
	}
	]

	# Expected output:
	# {"query": "obblighi segnalazione operazioni sospette SOS UIF D.Lgs. 231/2007
	# art. 35 riciclaggio finanziamento terrorismo portale RADAR tempistiche
	# invio indicatori anomalia"}
	```

	### 🔧 Tool Calling — Compliance Workflows

	```python
	messages = [
	{
	"role": "system",
	"content": """Sei un assistente operativo per la compliance.

	<tools>
	{"name": "calcola_scoring_rischio", "parameters": {...}}
	{"name": "controlla_liste_pep", "parameters": {...}}
	{"name": "verifica_kyc", "parameters": {...}}
	</tools>

	<contesto_recuperato>
	Procedura AML-003: L'adeguata verifica rafforzata (EDD) deve essere
	applicata per PEP, paesi ad alto rischio e profili con scoring > 60.
	</contesto_recuperato>"""
	},
	{
	"role": "user",
	"content": "Devo aprire un conto per una società con sede a Dubai. Il legale rappresentante è il sig. Al-Rashid."
	}
	]

	# The model will:
	# 1. Call controlla_liste_pep for the representative
	# 2. Call calcola_scoring_rischio based on risk factors
	# 3. Recommend EDD procedure per AML-003, grounded on retrieved policy
	```

	### 📊 Document Reranking

	```python
	messages = [
	{
	"role": "system",
	"content": "Valuta la rilevanza di ciascun candidato rispetto alla query. Restituisci solo i candidati rilevanti con score 0-100. Rispondi SOLO con il JSON richiesto."
	},
	{
	"role": "user",
	"content": '{"query": "requisiti CET1 fondi propri", "candidates": [{"id": "doc_001", "title": "Art. 92 CRR", "content": "..."}, {"id": "doc_002", "title": "DORA Art. 5", "content": "..."}]}'
	}
	]

	# Expected: {"matches": [{"id": "doc_001", "relevance": 95}]}
	```

	---

	## ⚙️ Training Details

	\| \| \|
	\|---\|---\|
	\| 🧬 Method \| LoRA — bf16 full precision (no quantization) \|
	\| 🏗️ Base Model \| Qwen2.5-14B-Instruct \|
	\| 📦 Dataset \| 923 train / 102 eval samples \|
	\| ⏱️ Duration \| 23.5 minutes \|

	### 📉 Training Metrics

	\| Metric \| Value \|
	\|---\|---\|
	\| Final Train Loss \| 1.127 \|
	\| Best Eval Loss \| 1.225 (step 640/693) \|
	\| Train/Eval Gap \| 0.098 ✅ \|

	> Gap of 0.098 indicates stable training with no overfitting.

	---

	## 📚 Dataset Coverage

	The training data covers the full lifecycle of a RAG-based compliance assistant:

	\| Category \| Purpose \|
	\|---\|---\|
	\| 🏷️ Title Generation \| Generate conversation titles from user queries \|
	\| 🔍 Query Expansion \| Enrich queries with regulatory terms for better retrieval \|
	\| 🧠 Intent Classification \| Route queries to RAG vs conversational responses \|
	\| 📊 Document Reranking \| Score retrieved documents by relevance \|
	\| 📝 Topic Extraction \| Extract main topics from regulatory text pages \|
	\| 📖 Document Summarization \| Summarize multi-page regulatory documents \|
	\| ⚖️ Relevance Filtering \| Filter regulatory text relevant to banks \|
	\| 📅 Metadata Extraction \| Find application dates, issuing authorities \|
	\| 🔧 Impact Analysis \| Cross-reference regulations vs internal procedures \|
	\| 💬 RAG Q&A + Tool Calling \| Multi-turn compliance conversations with tools \|

	Regulatory sources covered: CRR/CRR3, DORA (UE 2022/2554), D.Lgs. 231/2007 (AML), D.Lgs. 385/1993 (TUB), Circolare 285, PSD2, MiFID II/MiFIR, D.P.R. 180/1950 and related Banca d'Italia provisions.

	---

	## 🚀 Deployment

	### With vLLM
	```bash
	vllm serve ./models/RegTech-14B-Instruct --dtype bfloat16
	```

	### With Transformers
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("YOUR_REPO_ID", torch_dtype="bfloat16", device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("YOUR_REPO_ID")

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## ⚠️ Important Notes

	- 🎯 RAG-optimized — trained to work with retrieved context, not to memorize regulations. Always provide relevant documents in the system prompt.
	- 🏦 Domain-specific — optimized for Italian banking compliance. General capabilities may differ from the base model.
	- ⚖️ Not legal advice — a tool to assist compliance professionals, not a substitute for regulatory expertise.
	- 🔧 Tool schemas — tool calling works best with the specific function signatures used during training.

	---

	<p align="center">
	Built with ❤️ for banking RAG<br>
	<em>Fine-tuned with LoRA • Evaluated by Claude Opus 4.6 • Powered by Qwen2.5</em><br>
	<em>Contact For Commercial Use: https://landing.2sophia.ai</em>
	</p>