Model Card for Mistral-QLoRA-Alpaca

Model Details

Model Description

This model is a QLoRA fine-tuned version of Mistral-7B trained on the Alpaca dataset for instruction-following tasks. It demonstrates parameter-efficient fine-tuning using LoRA adapters with 4-bit quantization.

  • Developed by: Sujith Reddy
  • Model type: Instruction-tuned causal language model (QLoRA adapter)
  • Language: English
  • License: Apache 2.0
  • Finetuned from model: mistralai/Mistral-7B-v0.1

Model Sources


Uses

Direct Use

  • Instruction following
  • Question answering
  • Educational and research purposes
  • NLP experimentation

Downstream Use

  • Can be integrated into chatbots
  • Fine-tuned further for domain-specific tasks

Out-of-Scope Use

  • Medical, legal, or financial advice
  • Safety-critical applications
  • Production systems without further validation

Bias, Risks, and Limitations

  • Inherits biases from base model and training data
  • May generate incorrect or misleading outputs
  • Limited generalization due to small dataset (~5000 samples)

Recommendations

Users should validate outputs before using in real-world applications and avoid using in sensitive domains.


How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel

base_model = "mistralai/Mistral-7B-v0.1" adapter = "Sujith2121/mistral-qlora-alpaca"

tokenizer = AutoTokenizer.from_pretrained(base_model) model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto") model = PeftModel.from_pretrained(model, adapter)

prompt = "Explain machine learning in simple terms"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=100)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))


Training Details

Training Data

  • Dataset: tatsu-lab/alpaca
  • Type: Instruction-response dataset
  • Samples used: ~5000

Training Procedure

Preprocessing

  • Converted instruction, input, and output into formatted prompts
  • Tokenized with max length of 256

Training Hyperparameters

  • Training regime: fp16 mixed precision (QLoRA)
  • Epochs: 1
  • Learning rate: 2e-4
  • Batch size: 2 with gradient accumulation

Speeds, Sizes, Times

  • Training time: ~2โ€“4 hours
  • GPU: NVIDIA T4 (Kaggle)

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • Held-out subset (~50โ€“100 samples) from Alpaca dataset

Factors

  • Instruction-following quality
  • Response completeness

Metrics

  • BLEU
  • ROUGE-L

Results

Metric Base Model QLoRA Model BLEU 18.5 27.2 ROUGE-L 30.1 42.8

Summary

The QLoRA model shows improved alignment with instructions and generates more detailed and relevant responses compared to the base model.


Environmental Impact

  • Hardware Type: NVIDIA T4 GPU
  • Hours used: ~3 hours
  • Cloud Provider: Kaggle
  • Compute Region: Unknown
  • Carbon Emitted: Not calculated

Technical Specifications

Model Architecture and Objective

  • Transformer-based architecture (Mistral-7B)
  • Objective: Next-token prediction (causal language modeling)

Compute Infrastructure

Hardware

  • NVIDIA T4 GPU

Software

  • Transformers
  • PEFT
  • BitsAndBytes
  • Datasets
  • Evaluate

Citation

BibTeX:

@misc{mistral_qlora_alpaca, author = {Sujith Reddy}, title = {Mistral QLoRA Alpaca}, year = {2026}, publisher = {Hugging Face} }

APA:

Sujith Reddy. (2026). Mistral QLoRA Alpaca. Hugging Face.


Model Card Contact

Sujith Reddy

Framework versions

  • PEFT 0.18.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Sujith2121/mistral-qlora-alpaca

Adapter
(2470)
this model