Text Generation
Transformers
Safetensors
Italian
English
qwen2
lora
fine-tuned
banking
regtech
compliance
rag
tool-calling
italian
qwen2.5
conversational
text-generation-inference
Instructions to use Sophia-AI/RegTech-14B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Sophia-AI/RegTech-14B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Sophia-AI/RegTech-14B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Sophia-AI/RegTech-14B-Instruct") model = AutoModelForCausalLM.from_pretrained("Sophia-AI/RegTech-14B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Sophia-AI/RegTech-14B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Sophia-AI/RegTech-14B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sophia-AI/RegTech-14B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Sophia-AI/RegTech-14B-Instruct
- SGLang
How to use Sophia-AI/RegTech-14B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Sophia-AI/RegTech-14B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sophia-AI/RegTech-14B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Sophia-AI/RegTech-14B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sophia-AI/RegTech-14B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Sophia-AI/RegTech-14B-Instruct with Docker Model Runner:
docker model run hf.co/Sophia-AI/RegTech-14B-Instruct
| language: | |
| - it | |
| - en | |
| license: apache-2.0 | |
| library_name: transformers | |
| base_model: Qwen/Qwen2.5-14B-Instruct | |
| tags: | |
| - lora | |
| - fine-tuned | |
| - banking | |
| - regtech | |
| - compliance | |
| - rag | |
| - tool-calling | |
| - italian | |
| - qwen2.5 | |
| pipeline_tag: text-generation | |
| # π¦ RegTech-14B-Instruct | |
| > **Fine-tuned for RAG-powered banking compliance β not general knowledge.** | |
| A specialized [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) model fine-tuned to excel within a **Retrieval-Augmented Generation (RAG) pipeline** for Italian banking regulatory compliance. | |
| This model doesn't try to memorize regulations β it's trained to **work with retrieved context**: follow instructions precisely, produce structured outputs, call compliance tools, and maintain the right tone and terminology when grounded on regulatory documents. | |
| --- | |
| ## π― What This Model Does | |
| This fine-tuning optimizes the model's **behavior within a RAG system**, not its factual knowledge. Specifically: | |
| | Task | Description | | |
| |---|---| | |
| | π **RAG Q&A** | Answer regulatory questions grounded on retrieved documents | | |
| | π§ **Tool Calling** | KYC verification, risk scoring, PEP checks, SOS reporting | | |
| | π **Query Expansion** | Rewrite user queries with regulatory terminology for better retrieval | | |
| | π§ **Intent Detection** | Classify if a message needs document search or is conversational | | |
| | π **Document Reranking** | Score candidate documents by relevance | | |
| | π **Structured JSON** | Topic extraction, metadata, impact analysis in JSON format | | |
| | βοΈ **Impact Analysis** | Cross-reference external regulations against internal bank procedures | | |
| --- | |
| ## π Evaluation β LLM-as-Judge | |
| Evaluated by **Claude Opus 4.6** (Anthropic) across 11 blind test scenarios. The judge compared base vs fine-tuned model outputs without knowing which was which. | |
| ### π Head-to-Head | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββ | |
| β π’ Tuned Wins 8/11 (77.3%) β | |
| β π΄ Base Wins 2/11 (22.7%) β | |
| β βͺ Ties 1/11 β | |
| βββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ### π Quality Scores (1β5) | |
| | Criterion | Base | Tuned | Delta | | | |
| |---|:---:|:---:|:---:|---| | |
| | π― Instruction Following | 3.55 | **4.64** | +1.09 | π’π’π’ | | |
| | π Context Adherence | 3.82 | **4.82** | +1.00 | π’π’ | | |
| | β Accuracy | 4.00 | **4.73** | +0.73 | π’π’ | | |
| | π Format | 4.18 | **4.45** | +0.27 | π’ | | |
| | π£οΈ Tone | 4.73 | **4.82** | +0.09 | β | | |
| | **π Overall** | **4.06** | **4.69** | **+0.64** | **π’π’** | | |
| > Highest win rate across all model sizes at 77.3%. Instruction following jumps +1.09 and context adherence +1.00 β the fine-tuning dramatically improves the model's ability to stay grounded on retrieved regulatory context. | |
| ### π Results by Category | |
| | Category | Base | Tuned | Tie | | |
| |---|:---:|:---:|:---:| | |
| | π RAG Q&A | 0 | **2** | 0 | | |
| | π« Refusal Handling | 0 | **2** | 0 | | |
| | β οΈ Edge Cases | 0 | **1** | 0 | | |
| | π¨ Style & Tone | 0 | **1** | 0 | | |
| | π€ Data Extraction | 0 | 0 | 1 | | |
| | π JSON Output | 1 | 1 | 0 | | |
| | π§ Tool Use | 1 | 1 | 0 | | |
| ### π Comparison Across Model Sizes | |
| | Metric | 4B | 7B | 14B | 32B | | |
| |---|:---:|:---:|:---:|:---:| | |
| | Base score (pre-tuning) | 4.11 | 3.84 | 4.06 | **4.36** | | |
| | Tuned score | 4.68 | 4.78 | 4.69 | **4.80** | | |
| | Delta (improvement) | +0.57 | +0.95 | +0.64 | +0.44 | | |
| | Win rate | 68.2% | 68.2% | **77.3%** | 68.2% | | |
| | Best eval loss | 1.191 | 1.330 | 1.225 | **0.813** | | |
| | Token accuracy | ~73% | ~72% | ~72% | **~81%** | | |
| --- | |
| ## π‘ Usage Examples | |
| ### π RAG Q&A β Answering from Retrieved Context | |
| The model is designed to receive **retrieved regulatory documents as context** and answer based on them: | |
| ```python | |
| messages = [ | |
| { | |
| "role": "system", | |
| "content": """Sei un assistente per la compliance bancaria. | |
| Rispondi SOLO basandoti sul contesto fornito. | |
| <contesto_recuperato> | |
| Art. 92 CRR - Gli enti soddisfano in qualsiasi momento i seguenti | |
| requisiti: a) CET1 del 4,5%; b) Tier 1 del 6%; c) capitale totale dell'8%. | |
| Il coefficiente Γ¨ calcolato come rapporto tra i fondi propri e | |
| l'importo complessivo dell'esposizione al rischio. | |
| </contesto_recuperato>""" | |
| }, | |
| { | |
| "role": "user", | |
| "content": "Quali sono i requisiti minimi di capitale secondo il CRR?" | |
| } | |
| ] | |
| ``` | |
| ### π Query Expansion β Improving RAG Retrieval | |
| ```python | |
| messages = [ | |
| { | |
| "role": "system", | |
| "content": "Riscrivi la query dell'utente in una versione piΓΉ ricca per migliorare il recupero documentale (RAG). Aggiungi termini tecnici e riferimenti normativi. Rispondi SOLO con il JSON richiesto." | |
| }, | |
| { | |
| "role": "user", | |
| "content": "## QUERY ORIGINALE: [obblighi segnalazione operazioni sospette]" | |
| } | |
| ] | |
| # Expected output: | |
| # {"query": "obblighi segnalazione operazioni sospette SOS UIF D.Lgs. 231/2007 | |
| # art. 35 riciclaggio finanziamento terrorismo portale RADAR tempistiche | |
| # invio indicatori anomalia"} | |
| ``` | |
| ### π§ Tool Calling β Compliance Workflows | |
| ```python | |
| messages = [ | |
| { | |
| "role": "system", | |
| "content": """Sei un assistente operativo per la compliance. | |
| <tools> | |
| {"name": "calcola_scoring_rischio", "parameters": {...}} | |
| {"name": "controlla_liste_pep", "parameters": {...}} | |
| {"name": "verifica_kyc", "parameters": {...}} | |
| </tools> | |
| <contesto_recuperato> | |
| Procedura AML-003: L'adeguata verifica rafforzata (EDD) deve essere | |
| applicata per PEP, paesi ad alto rischio e profili con scoring > 60. | |
| </contesto_recuperato>""" | |
| }, | |
| { | |
| "role": "user", | |
| "content": "Devo aprire un conto per una societΓ con sede a Dubai. Il legale rappresentante Γ¨ il sig. Al-Rashid." | |
| } | |
| ] | |
| # The model will: | |
| # 1. Call controlla_liste_pep for the representative | |
| # 2. Call calcola_scoring_rischio based on risk factors | |
| # 3. Recommend EDD procedure per AML-003, grounded on retrieved policy | |
| ``` | |
| ### π Document Reranking | |
| ```python | |
| messages = [ | |
| { | |
| "role": "system", | |
| "content": "Valuta la rilevanza di ciascun candidato rispetto alla query. Restituisci solo i candidati rilevanti con score 0-100. Rispondi SOLO con il JSON richiesto." | |
| }, | |
| { | |
| "role": "user", | |
| "content": '{"query": "requisiti CET1 fondi propri", "candidates": [{"id": "doc_001", "title": "Art. 92 CRR", "content": "..."}, {"id": "doc_002", "title": "DORA Art. 5", "content": "..."}]}' | |
| } | |
| ] | |
| # Expected: {"matches": [{"id": "doc_001", "relevance": 95}]} | |
| ``` | |
| --- | |
| ## βοΈ Training Details | |
| | | | | |
| |---|---| | |
| | 𧬠**Method** | LoRA β bf16 full precision (no quantization) | | |
| | ποΈ **Base Model** | Qwen2.5-14B-Instruct | | |
| | π¦ **Dataset** | 923 train / 102 eval samples | | |
| | β±οΈ **Duration** | 23.5 minutes | | |
| ### π Training Metrics | |
| | Metric | Value | | |
| |---|---| | |
| | Final Train Loss | 1.127 | | |
| | Best Eval Loss | 1.225 (step 640/693) | | |
| | Train/Eval Gap | 0.098 β | | |
| > Gap of 0.098 indicates **stable training with no overfitting**. | |
| --- | |
| ## π Dataset Coverage | |
| The training data covers the full lifecycle of a RAG-based compliance assistant: | |
| | Category | Purpose | | |
| |---|---| | |
| | π·οΈ Title Generation | Generate conversation titles from user queries | | |
| | π Query Expansion | Enrich queries with regulatory terms for better retrieval | | |
| | π§ Intent Classification | Route queries to RAG vs conversational responses | | |
| | π Document Reranking | Score retrieved documents by relevance | | |
| | π Topic Extraction | Extract main topics from regulatory text pages | | |
| | π Document Summarization | Summarize multi-page regulatory documents | | |
| | βοΈ Relevance Filtering | Filter regulatory text relevant to banks | | |
| | π Metadata Extraction | Find application dates, issuing authorities | | |
| | π§ Impact Analysis | Cross-reference regulations vs internal procedures | | |
| | π¬ RAG Q&A + Tool Calling | Multi-turn compliance conversations with tools | | |
| **Regulatory sources covered:** CRR/CRR3, DORA (UE 2022/2554), D.Lgs. 231/2007 (AML), D.Lgs. 385/1993 (TUB), Circolare 285, PSD2, MiFID II/MiFIR, D.P.R. 180/1950 and related Banca d'Italia provisions. | |
| --- | |
| ## π Deployment | |
| ### With vLLM | |
| ```bash | |
| vllm serve ./models/RegTech-14B-Instruct --dtype bfloat16 | |
| ``` | |
| ### With Transformers | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("YOUR_REPO_ID", torch_dtype="bfloat16", device_map="auto") | |
| tokenizer = AutoTokenizer.from_pretrained("YOUR_REPO_ID") | |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) | |
| outputs = model.generate(**inputs, max_new_tokens=512) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| --- | |
| ## β οΈ Important Notes | |
| - π― **RAG-optimized** β trained to work with retrieved context, not to memorize regulations. Always provide relevant documents in the system prompt. | |
| - π¦ **Domain-specific** β optimized for Italian banking compliance. General capabilities may differ from the base model. | |
| - βοΈ **Not legal advice** β a tool to assist compliance professionals, not a substitute for regulatory expertise. | |
| - π§ **Tool schemas** β tool calling works best with the specific function signatures used during training. | |
| --- | |
| <p align="center"> | |
| Built with β€οΈ for banking RAG<br> | |
| <em>Fine-tuned with LoRA β’ Evaluated by Claude Opus 4.6 β’ Powered by Qwen2.5</em><br> | |
| <em>Contact For Commercial Use: https://landing.2sophia.ai</em> | |
| </p> |