README.md · sabber/medphi-medical-qa-adapter at main

File size: 15,353 Bytes

9fd67db

---
language:
- en
license: apache-2.0
library_name: peft
tags:
- medical
- healthcare
- question-answering
- conversational-ai
- medical-qa
- clinical-nlp
- lora
- medphi
- patient-education
base_model: microsoft/MediPhi-Instruct
datasets:
- private
pipeline_tag: text-generation
model-index:
- name: medphi-medical-qa-adapter
  results:
  - task:
      type: question-answering
      name: Medical Question Answering
    dataset:
      name: Medical Screening Dataset
      type: custom
    metrics:
    - name: Training Loss
      type: loss
      value: 0.6441
    - name: Validation Loss
      type: loss
      value: 0.6446
---

# MediPhi Medical QA Adapter

This is a LoRA adapter fine-tuned on Microsoft's [MediPhi-Instruct](https://huggingface.co/microsoft/MediPhi-Instruct) for medical question-answering. The model is designed to provide comprehensive, accurate answers to questions about medical diseases, conditions, and health-related topics.

## Model Description

- **Model Type:** LoRA Adapter for Causal Language Model
- **Base Model:** microsoft/MediPhi-Instruct (3.8B parameters)
- **Trainable Parameters:** 0.328% (12.5M parameters via LoRA)
- **Language:** English
- **Domain:** Medical/Healthcare
- **Task:** Question Answering, Conversational AI
- **License:** Apache 2.0

### Model Purpose

This model serves as a medical assistant chatbot capable of answering user queries about medical conditions, diseases, symptoms, treatments, and genetic disorders. It has been fine-tuned on 16,406 medical Q&A pairs covering a wide range of health topics including rare genetic disorders and common medical conditions.

## Key Features

- **Medical Domain Expertise:** Trained on diverse medical Q&A covering diseases and conditions
- **Comprehensive Responses:** Generates detailed explanations including definitions, causes, symptoms, and treatments
- **Step-by-Step Reasoning:** Employs structured thinking for medical information delivery
- **Efficient Fine-tuning:** Uses 4-bit quantization with LoRA for memory efficiency
- **Patient Education Focus:** Optimized for explaining complex medical concepts clearly

## Training Data

### Dataset Statistics

- **Total Q&A Pairs:** 16,406 medical question-answer pairs
- **Dataset Size:** 21 MB
- **Data Splits:**
  - Train: 12,304 samples (75%)
  - Validation: 2,051 samples (12.5%)
  - Test: 2,051 samples (12.5%)

### Data Coverage

The dataset covers a wide range of medical topics including:
- **Rare Genetic Disorders:** Tourette syndrome, Denys-Drash syndrome, etc.
- **Common Conditions:** Dry eye syndrome, immunodeficiency disorders
- **Medical Concepts:** Genetic inheritance patterns, diagnostic methods
- **Treatment Information:** Management strategies, preventive care

### Data Format

```python
{
    "messages": [
        {
            "role": "system",
            "content": "You are a knowledgeable medical assistant. Provide accurate information about medical conditions and diseases. Always think step by step."
        },
        {
            "role": "user",
            "content": "What is [medical condition]?"
        },
        {
            "role": "assistant",
            "content": "[Comprehensive medical explanation]"
        }
    ]
}
```

## Training Details

### Training Configuration

- **Framework:** PyTorch with Hugging Face Transformers
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation) with SFT (Supervised Fine-Tuning)
- **Quantization:** 4-bit NF4 with double quantization
- **Compute:** Single RTX 5090 GPU (16 vCPU, 141 GB RAM)
- **Training Time:** ~140 steps to convergence

### LoRA Hyperparameters

```python
{
    "r": 8,
    "lora_alpha": 32,
    "target_modules": ["o_proj", "qkv_proj", "gate_up_proj", "down_proj"],
    "lora_dropout": 0.05,
    "bias": "none",
    "task_type": "CAUSAL_LM"
}
```

### Training Hyperparameters

```python
{
    "num_train_epochs": 3,
    "per_device_train_batch_size": 4,
    "gradient_accumulation_steps": 8,
    "learning_rate": 2e-4,
    "lr_scheduler_type": "cosine",
    "max_seq_length": 1024,
    "optim": "adamw_torch",
    "gradient_checkpointing": True,
    "packing": True
}
```

### Quantization Configuration

```python
{
    "load_in_4bit": True,
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": True,
    "bnb_4bit_compute_dtype": "bfloat16"
}
```

## Performance

### Training Convergence

| Step | Training Loss | Validation Loss |
|------|---------------|-----------------|
| 20   | 1.1799        | 0.8300          |
| 40   | 0.7834        | 0.7168          |
| 60   | 0.7185        | 0.6892          |
| 80   | 0.6838        | 0.6710          |
| 100  | 0.6641        | 0.6592          |
| 120  | 0.6638        | 0.6515          |
| 140  | 0.6441        | 0.6446          |

**Key Observations:**
- Rapid convergence within 140 training steps
- Training and validation loss converged, indicating good generalization
- No significant overfitting observed

### Qualitative Improvements

**Example 1 - Dry Eye Condition**

*Original Dataset Response:* Citations and contact information only

*Fine-tuned Model Response:* Comprehensive explanation covering:
- Definition and mechanism
- Environmental, aging, and medication-related causes
- Symptoms (gritty sensation, redness, blurred vision, light sensitivity)
- Treatment options (artificial tears, lifestyle modifications, medical interventions)

**Example 2 - Genetic Disorders**

*Original Dataset Response:* Basic definition of 3 types

*Fine-tuned Model Response:* Expanded information including:
- Inheritance patterns (autosomal dominant/recessive, X-linked, mitochondrial)
- Specific examples (cystic fibrosis, sickle cell disease, Huntington's disease, Down syndrome)
- Diagnostic methods and genetic testing
- Management strategies and treatment approaches
- Prevention through genetic counseling

## Usage

### Installation

```bash
pip install torch transformers peft bitsandbytes accelerate
```

### Basic Usage

```python
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

# Load model and tokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
    "sabber/medphi-medical-qa-adapter",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("sabber/medphi-medical-qa-adapter")

# Prepare conversation
messages = [
    {
        "role": "system",
        "content": "You are a knowledgeable medical assistant. Provide accurate information about medical conditions and diseases. Always think step by step."
    },
    {
        "role": "user",
        "content": "What is Type 2 Diabetes and what are its main symptoms?"
    }
]

# Tokenize and generate
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Pipeline Usage

```python
from transformers import pipeline

# Create conversational pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

# Ask medical question
messages = [
    {"role": "system", "content": "You are a knowledgeable medical assistant. Provide accurate information about medical conditions and diseases. Always think step by step."},
    {"role": "user", "content": "What causes high blood pressure?"}
]

result = pipe(messages)
print(result[0]['generated_text'][-1]['content'])
```

### Multi-Turn Conversation

```python
conversation_history = [
    {
        "role": "system",
        "content": "You are a knowledgeable medical assistant. Provide accurate information about medical conditions and diseases. Always think step by step."
    }
]

# First question
conversation_history.append({"role": "user", "content": "What is asthma?"})
response = generate_response(conversation_history)
conversation_history.append({"role": "assistant", "content": response})

# Follow-up question
conversation_history.append({"role": "user", "content": "What triggers asthma attacks?"})
response = generate_response(conversation_history)
print(response)
```

### Merging Adapter with Base Model

```python
from peft import AutoPeftModelForCausalLM

# Load and merge
model = AutoPeftModelForCausalLM.from_pretrained(
    "sabber/medphi-medical-qa-adapter",
    torch_dtype="auto",
    device_map="auto"
)
merged_model = model.merge_and_unload()

# Save merged model
merged_model.save_pretrained("medphi-medical-qa-merged")
tokenizer.save_pretrained("medphi-medical-qa-merged")
```

## System Prompt

The model uses the following system prompt for optimal performance:

```
You are a knowledgeable medical assistant. Provide accurate information about
medical conditions and diseases. Always think step by step.
```

This prompt encourages:
- **Structured reasoning:** Step-by-step explanations
- **Accuracy focus:** Emphasis on providing correct medical information
- **Comprehensive coverage:** Detailed responses covering multiple aspects

## Limitations and Bias

### Limitations

1. **Training Data Scope:** Model trained on 16,406 Q&A pairs; may not cover all medical conditions
2. **Not a Medical Professional:** Cannot replace professional medical advice or diagnosis
3. **Language:** English only
4. **Clinical Validation:** Outputs should be reviewed by healthcare professionals before clinical application
5. **Rare Conditions:** Performance may vary for extremely rare or newly discovered conditions
6. **Quantization Effects:** 4-bit quantization may affect precision in certain edge cases

### Bias Considerations

- **Dataset Bias:** Training data may reflect biases present in medical literature
- **Language Bias:** Trained exclusively on English medical content
- **Regional Bias:** May reflect medical practices and terminology from specific regions
- **Completeness:** May provide more detailed responses for well-documented conditions

### Ethical Considerations

- **Not for Diagnosis:** This model should NOT be used for self-diagnosis or medical decision-making
- **Professional Review Required:** All outputs must be reviewed by qualified healthcare professionals
- **Patient Safety:** Users should always consult with licensed medical professionals for health concerns
- **Transparency:** Users should be informed when AI-generated medical content is provided
- **Privacy:** Do not share personally identifiable health information when using this model

## Intended Use

### Primary Use Cases

✅ **Medical Education:** Teaching medical concepts and terminology
✅ **Patient Information:** Providing general information about conditions and diseases
✅ **Research Assistant:** Helping researchers understand medical concepts
✅ **Content Generation:** Creating draft content for medical education materials
✅ **Conversational AI:** Building medical information chatbots and assistants

### Out-of-Scope Use

❌ **Clinical Diagnosis:** Not validated for diagnostic purposes
❌ **Treatment Planning:** Not suitable for creating treatment plans
❌ **Emergency Response:** Not appropriate for emergency medical situations
❌ **Prescription Decisions:** Cannot be used for medication recommendations
❌ **Mental Health Crisis:** Not designed for crisis intervention or counseling
❌ **Legal/Medical Records:** Not validated for official medical documentation

## Evaluation Benchmarks

The model has been prepared for evaluation on standard medical benchmarks:

- **MEDQA:** Medical Question Answering benchmark
- **MEDMCQA:** Multiple Choice Medical Questions
- **PubMedQA:** Biomedical literature question answering
- **MMLU Medical Subsets:**
  - Anatomy
  - Clinical Knowledge
  - College Medicine
  - Medical Genetics
  - Professional Medicine

*Note: Comprehensive benchmark results will be added as evaluation completes.*

## Future Improvements

Suggested enhancements based on current limitations:

1. **Increase LoRA Rank:** Higher rank for greater model capacity
2. **Full Precision Training:** Use FP32 or FP16 instead of 4-bit quantization
3. **Data Augmentation:** Expand training data with more diverse medical sources
4. **Error Analysis:** Systematic analysis of model failure cases
5. **Benchmark Evaluation:** Complete evaluation on medical QA benchmarks
6. **Multi-lingual Support:** Extend to support multiple languages
7. **Clinical Validation:** Formal evaluation by medical professionals

## Model Architecture

### Base Model: MediPhi-Instruct (Phi-3.5-mini-instruct)

**Key Components:**
- **Parameters:** 3.8 billion
- **Vocabulary Size:** 32,064 tokens
- **Hidden Dimension:** 3,072
- **Layers:** 32 Phi3DecoderLayers
- **Attention:** Multi-head self-attention with rotary positional embeddings
- **Activation:** SiLU (Swish) activation function
- **Normalization:** RMSNorm layer normalization

**LoRA Target Modules:**
- `o_proj` - Output projection in attention
- `qkv_proj` - Query-Key-Value projection in attention
- `gate_up_proj` - Gate and up projection in MLP
- `down_proj` - Down projection in MLP

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{medphi-medical-qa-adapter,
  author = {Sabber Ahamed},
  title = {MediPhi Medical QA Adapter: LoRA Fine-tuning for Medical Question Answering},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/sabber/medphi-medical-qa-adapter}},
  note = {Fine-tuned on 16,406 medical Q&A pairs for patient education and medical information retrieval}
}
```

Please also cite the base MediPhi model:

```bibtex
@article{medphi2024,
  title={MediPhi: A Medical Language Model},
  author={Microsoft Research},
  journal={arXiv preprint},
  year={2024}
}
```

## Model Card Authors

Sabber Ahamed

## Model Card Contact

For questions, issues, or feedback, please:
- Open an issue on the [model repository](https://huggingface.co/sabber/medphi-medical-qa-adapter/discussions)
- Contact via Hugging Face profile

## Acknowledgments

- **Base Model:** Microsoft MediPhi-Instruct team
- **Framework:** Hugging Face Transformers, PEFT, and TRL libraries
- **Compute:** GPU infrastructure for model training
- **Community:** Open-source ML and medical NLP communities

## Additional Resources

- **Training Code:** Available in project repository
- **Evaluation Scripts:** Provided for reproducibility
- **Documentation:** Comprehensive README with implementation details

---

**Medical Disclaimer:** This model is provided for educational and research purposes only. It is NOT approved for clinical use, medical diagnosis, or treatment planning. All medical information should be verified by qualified healthcare professionals. In case of medical emergencies, contact emergency services immediately. Always consult with licensed medical professionals for health concerns and treatment decisions.

**Technical Disclaimer:** This model may generate incorrect or incomplete information. Users should verify all outputs and use appropriate safeguards when deploying in production environments. The model's responses should be reviewed and validated before any public-facing use.