Abdul1102's picture
Update README.md
aee3df5 verified
---
library_name: transformers
tags:
- llama-3.2
- causal-lm
- code
- python
- peft
- qlora
---
# Model Card for llama32-1b-python-docstrings-qlora
A parameter-efficiently fine-tuned adapter on top of `meta-llama/Llama-3.2-1B-Instruct` for generating concise one-line Python docstrings from function bodies.
## Model Details
### Model Description
- **Developed by:** Abdullah Al-Housni
- **Model type:** Causal language model with LoRA/QLoRA adapters
- **Language(s):** Python code as input, English docstrings as output
- **License:** Same as `meta-llama/Llama-3.2-1B-Instruct` (Meta Llama 3.2 Community License)
- **Finetuned from model:** `meta-llama/Llama-3.2-1B-Instruct`
The model is trained to take a Python function definition and generate a concise, one-line docstring describing what the function does.
## Uses
### Direct Use
- Automatically generate one-line Python docstrings for functions.
- Improve or bootstrap documentation in Python codebases.
- Educational use for learning how to summarize code behavior.
Typical usage pattern:
- Input: Python function body (source code).
- Output: Single-sentence English description suitable as a docstring.
### Out-of-Scope Use
- Generating full, multi-paragraph API documentation.
- Security auditing or correctness guarantees for code.
- Use outside Python (e.g., other programming languages) without additional fine-tuning.
- Any safety-critical application where incorrect summaries could cause harm.
## Bias, Risks, and Limitations
- The model can produce **incorrect or incomplete summaries**, especially for complex or ambiguous functions.
- It may imitate noisy or low-quality patterns from the training data (e.g., overly short or cryptic docstrings).
- It does **not** understand project-specific context, invariants, or business logic; outputs should be reviewed by a human developer.
### Recommendations
- Use the model as an **assistive tool**, not an authoritative source.
- Always review and edit generated docstrings before committing to production code.
- For non-Python or highly domain-specific code, consider additional fine-tuning on in-domain examples.
## How to Get Started with the Model
Example with 🤗 Transformers and PEFT (LoRA adapter):
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model_id = "meta-llama/Llama-3.2-1B-Instruct"
adapter_id = "Abdul1102/llama32-1b-python-docstrings-qlora"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_id)
def make_prompt(code: str) -> str:
return
f'Write a one-line Python docstring for this function:\n\n{code}\n\n"""'
code = "def add(a, b):\n return a + b"
inputs = tokenizer(make_prompt(code), return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=32, do_sample=False)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)
```
## Training Details
### Training Data
- Dataset: Python subset of CodeSearchNet (`Nan-Do/code-search-net-python`)
- Inputs: `code` column (full Python function body)
- Targets: First non-empty line of `docstring`
- A filtered subset of ~1,000–2,000 examples was used for efficient QLoRA fine-tuning
### Training Procedure
- Objective: Causal language modeling (predict the docstring continuation)
- Method: QLoRA (4-bit quantized base model with LoRA adapters)
- Precision: 4-bit quantized weights, bf16 compute
- Epochs: 1
- Max sequence length: 256–512 tokens
#### Training Hyperparameters
- Learning rate: ~2e-4 (adapter weights only)
- Epochs: 1
- Optimizer: AdamW via Hugging Face `Trainer`
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.05
---
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
Held-out test split from the same CodeSearchNet Python dataset, using identical `code` → one-line docstring mapping.
#### Factors
- Function size and complexity
- Variety in docstring writing styles
- Presence of short or noisy docstrings
#### Metrics
- BLEU (sacreBLEU): strict n-gram overlap, sensitive to paraphrasing
- ROUGE (ROUGE-1 / ROUGE-2 / ROUGE-L): better for short summaries
### Results
Approximate performance on ~50 held-out samples:
- BLEU: ~12.4
- ROUGE-1: ~0.78
- ROUGE-2: ~0.74
- ROUGE-L: ~0.78
#### Summary
The model frequently reproduces or closely paraphrases the correct docstring. Occasional failures include echoing part of the prompt or returning an empty string. Strong performance for a 1B model trained briefly on a small dataset.
---
## Model Examination
Not applicable.
---
## Environmental Impact
- Hardware Type: Google Colab GPU (T4/L4)
- Hours Used: ~0.5–1 hour total
- Cloud Provider: Google Colab
- Compute Region: US
- Carbon Emitted: Not estimated (very low due to minimal training time)
---
## Technical Specifications
### Model Architecture and Objective
- Base model: Llama 3.2 1B Instruct
- Architecture: Decoder-only transformer
- Objective: Causal language modeling
- Parameter-efficient fine-tuning using LoRA (rank 16)
### Compute Infrastructure
#### Hardware
Single Google Colab GPU (T4 or L4)
#### Software
- Python
- PyTorch
- Hugging Face Transformers
- PEFT
- bitsandbytes
- Datasets
---
## Citation
Not applicable.
---
## Glossary
Not applicable.
---
## More Information
See the Hugging Face model page for updates or usage examples.
---
## Model Card Authors
Abdullah Al-Housni
---
## Model Card Contact
Available through the Hugging Face model repository.