Update README.md

aee3df5 verified about 1 month ago

5.69 kB

	---
	library_name: transformers
	tags:
	- llama-3.2
	- causal-lm
	- code
	- python
	- peft
	- qlora
	---

	# Model Card for llama32-1b-python-docstrings-qlora

	A parameter-efficiently fine-tuned adapter on top of `meta-llama/Llama-3.2-1B-Instruct` for generating concise one-line Python docstrings from function bodies.

	## Model Details

	### Model Description

	- Developed by: Abdullah Al-Housni
	- Model type: Causal language model with LoRA/QLoRA adapters
	- Language(s): Python code as input, English docstrings as output
	- License: Same as `meta-llama/Llama-3.2-1B-Instruct` (Meta Llama 3.2 Community License)
	- Finetuned from model: `meta-llama/Llama-3.2-1B-Instruct`

	The model is trained to take a Python function definition and generate a concise, one-line docstring describing what the function does.

	## Uses

	### Direct Use

	- Automatically generate one-line Python docstrings for functions.
	- Improve or bootstrap documentation in Python codebases.
	- Educational use for learning how to summarize code behavior.

	Typical usage pattern:
	- Input: Python function body (source code).
	- Output: Single-sentence English description suitable as a docstring.

	### Out-of-Scope Use

	- Generating full, multi-paragraph API documentation.
	- Security auditing or correctness guarantees for code.
	- Use outside Python (e.g., other programming languages) without additional fine-tuning.
	- Any safety-critical application where incorrect summaries could cause harm.

	## Bias, Risks, and Limitations

	- The model can produce incorrect or incomplete summaries, especially for complex or ambiguous functions.
	- It may imitate noisy or low-quality patterns from the training data (e.g., overly short or cryptic docstrings).
	- It does not understand project-specific context, invariants, or business logic; outputs should be reviewed by a human developer.

	### Recommendations

	- Use the model as an assistive tool, not an authoritative source.
	- Always review and edit generated docstrings before committing to production code.
	- For non-Python or highly domain-specific code, consider additional fine-tuning on in-domain examples.

	## How to Get Started with the Model

	Example with 🤗 Transformers and PEFT (LoRA adapter):

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel

	base_model_id = "meta-llama/Llama-3.2-1B-Instruct"
	adapter_id = "Abdul1102/llama32-1b-python-docstrings-qlora"

	tokenizer = AutoTokenizer.from_pretrained(base_model_id)
	model = AutoModelForCausalLM.from_pretrained(base_model_id, device_map="auto")
	model = PeftModel.from_pretrained(model, adapter_id)

	def make_prompt(code: str) -> str:
	return
	f'Write a one-line Python docstring for this function:\n\n{code}\n\n"""'

	code = "def add(a, b):\n return a + b"
	inputs = tokenizer(make_prompt(code), return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=32, do_sample=False)
	text = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(text)
	```

	## Training Details

	### Training Data

	- Dataset: Python subset of CodeSearchNet (`Nan-Do/code-search-net-python`)
	- Inputs: `code` column (full Python function body)
	- Targets: First non-empty line of `docstring`
	- A filtered subset of ~1,000–2,000 examples was used for efficient QLoRA fine-tuning

	### Training Procedure

	- Objective: Causal language modeling (predict the docstring continuation)
	- Method: QLoRA (4-bit quantized base model with LoRA adapters)
	- Precision: 4-bit quantized weights, bf16 compute
	- Epochs: 1
	- Max sequence length: 256–512 tokens

	#### Training Hyperparameters

	- Learning rate: ~2e-4 (adapter weights only)
	- Epochs: 1
	- Optimizer: AdamW via Hugging Face `Trainer`
	- LoRA rank: 16
	- LoRA alpha: 32
	- LoRA dropout: 0.05

	---

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	Held-out test split from the same CodeSearchNet Python dataset, using identical `code` → one-line docstring mapping.

	#### Factors

	- Function size and complexity
	- Variety in docstring writing styles
	- Presence of short or noisy docstrings

	#### Metrics

	- BLEU (sacreBLEU): strict n-gram overlap, sensitive to paraphrasing
	- ROUGE (ROUGE-1 / ROUGE-2 / ROUGE-L): better for short summaries

	### Results

	Approximate performance on ~50 held-out samples:

	- BLEU: ~12.4
	- ROUGE-1: ~0.78
	- ROUGE-2: ~0.74
	- ROUGE-L: ~0.78

	#### Summary

	The model frequently reproduces or closely paraphrases the correct docstring. Occasional failures include echoing part of the prompt or returning an empty string. Strong performance for a 1B model trained briefly on a small dataset.

	---

	## Model Examination

	Not applicable.

	---

	## Environmental Impact

	- Hardware Type: Google Colab GPU (T4/L4)
	- Hours Used: ~0.5–1 hour total
	- Cloud Provider: Google Colab
	- Compute Region: US
	- Carbon Emitted: Not estimated (very low due to minimal training time)

	---

	## Technical Specifications

	### Model Architecture and Objective

	- Base model: Llama 3.2 1B Instruct
	- Architecture: Decoder-only transformer
	- Objective: Causal language modeling
	- Parameter-efficient fine-tuning using LoRA (rank 16)

	### Compute Infrastructure

	#### Hardware

	Single Google Colab GPU (T4 or L4)

	#### Software

	- Python
	- PyTorch
	- Hugging Face Transformers
	- PEFT
	- bitsandbytes
	- Datasets

	---

	## Citation

	Not applicable.

	---

	## Glossary

	Not applicable.

	---

	## More Information

	See the Hugging Face model page for updates or usage examples.

	---

	## Model Card Authors

	Abdullah Al-Housni

	---

	## Model Card Contact

	Available through the Hugging Face model repository.