Llama-3-8B SQL Expert (Fine-Tuned)

Fine-tuned version of Meta's Llama-3-8B model for converting natural language questions to SQL queries using LoRA adapters.

Model Details

  • Base Model: meta-llama/Meta-Llama-3-8B
  • Fine-tuned by: ESTU Research Team (Kulalı, Aydın, Alhan, Fidan)
  • Institution: Eskisehir Technical University
  • Project: TÜBİTAK 2209-A Research
  • License: MIT
  • Language: English
  • Task: Natural Language to SQL Translation
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)

Performance

  • Execution Accuracy: 78.2%
  • Exact Match: 68.9%
  • Average Latency: 450ms
  • Model Size: 16.2 GB (full) / 196 MB (LoRA adapters only)

Training Details

Training Data

Training Configuration

{
  "base_model": "meta-llama/Meta-Llama-3-8B",
  "method": "LoRA",
  "rank": 16,
  "alpha": 32,
  "dropout": 0.05,
  "target_modules": [
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj"
  ],
  "epochs": 4,
  "batch_size": 8,
  "learning_rate": 2e-4,
  "training_time": "12.4 hours (A100 GPU)"
}

Training Results

Epoch 1: Loss 1.234 | Val Loss 1.289 | Accuracy 69.4%
Epoch 2: Loss 0.543 | Val Loss 0.612 | Accuracy 74.1%
Epoch 3: Loss 0.298 | Val Loss 0.334 | Accuracy 76.8%
Epoch 4: Loss 0.187 | Val Loss 0.221 | Accuracy 78.2%

Usage

Installation

pip install transformers torch peft

Quick Start (LoRA)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "estu-research/llama3-8b-sql-ft")

# Example query
question = """
Schema: CREATE TABLE customers (customerNumber INT, customerName VARCHAR(50), country VARCHAR(50));
Question: List all customers from France
"""

inputs = tokenizer(question, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
sql = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(sql)
# Output: SELECT * FROM customers WHERE country = 'France';

Full Model Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load merged model
model = AutoModelForCausalLM.from_pretrained("estu-research/llama3-8b-sql-ft")
tokenizer = AutoTokenizer.from_pretrained("estu-research/llama3-8b-sql-ft")

question = "Show top 10 products by price"
inputs = tokenizer(question, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.1)
sql = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(sql)

Pipeline Usage

from transformers import pipeline

pipe = pipeline("text-generation", model="estu-research/llama3-8b-sql-ft")

result = pipe(
    "Schema: CREATE TABLE orders (orderDate DATE, amount DECIMAL);\nQuestion: Total sales by month in 2024",
    max_new_tokens=200
)
print(result[0]['generated_text'])

Example Queries

Natural Language Generated SQL
Count orders per customer SELECT customerName, COUNT(orderNumber) FROM customers JOIN orders USING(customerNumber) GROUP BY customerNumber;
Average order value SELECT AVG(quantityOrdered * priceEach) as avg_value FROM orderDetails;
Customers with no orders SELECT customerName FROM customers WHERE customerNumber NOT IN (SELECT DISTINCT customerNumber FROM orders);

Comparison with Other Models

Model Accuracy Latency Cost Size
GPT-4o-mini (FT) 97.8% 800ms $0.30/1K Cloud
GPT-4 92.1% 1200ms $3.00/1K Cloud
Llama-3-8B (FT) 78.2% 450ms Free 16GB
Gemma-7B (FT) 76.0% 500ms Free 14GB
GPT-3.5 Turbo 78.9% 500ms $0.05/1K Cloud

Advantages

Open Source: Fully downloadable and modifiable
Cost-Effective: Free self-hosting
Privacy: On-premise deployment
Fast: 450ms average latency
Efficient: LoRA adapters only 196 MB

Limitations

  • Trained primarily on sales database schema
  • May struggle with highly complex nested queries
  • Best performance on English language queries
  • Requires GPU for optimal inference (4-8 GB VRAM minimum)

Intended Use

  • Primary: Natural language to SQL translation for analytics
  • Secondary: SQL query assistance and education
  • Self-Hosted: Privacy-sensitive applications
  • Not For: Production databases without query validation

Hardware Requirements

Inference

  • Minimum: 8 GB VRAM (with quantization)
  • Recommended: 16 GB VRAM (full precision)
  • Optimal: 24 GB VRAM (batch processing)

LoRA Only

  • Minimum: 4 GB VRAM
  • Recommended: 8 GB VRAM

Citation

@misc{llama3-8b-sql-ft,
  title={Llama-3-8B SQL Expert: Fine-Tuned Model for Text-to-SQL},
  author={Kulalı and Aydın and Alhan and Fidan},
  institution={Eskisehir Technical University},
  year={2024},
  url={https://huggingface.co/estu-research/llama3-8b-sql-ft}
}

Links

Acknowledgments

This work was supported by TÜBİTAK 2209-A Research Grant at Eskisehir Technical University.

Special thanks to Meta AI for releasing Llama-3.

License

MIT License - See LICENSE file for details

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for estu-research/llama3-8b-sql-ft

Adapter
(657)
this model

Dataset used to train estu-research/llama3-8b-sql-ft