Model Card for llama32-1b-python-docstrings-qlora

A parameter-efficiently fine-tuned adapter on top of meta-llama/Llama-3.2-1B-Instruct for generating concise one-line Python docstrings from function bodies.

Model Details

Model Description

Developed by: Abdullah Al-Housni
Model type: Causal language model with LoRA/QLoRA adapters
Language(s): Python code as input, English docstrings as output
License: Same as meta-llama/Llama-3.2-1B-Instruct (Meta Llama 3.2 Community License)
Finetuned from model: meta-llama/Llama-3.2-1B-Instruct

The model is trained to take a Python function definition and generate a concise, one-line docstring describing what the function does.

Uses

Direct Use

Automatically generate one-line Python docstrings for functions.
Improve or bootstrap documentation in Python codebases.
Educational use for learning how to summarize code behavior.

Typical usage pattern:

Input: Python function body (source code).
Output: Single-sentence English description suitable as a docstring.

Out-of-Scope Use

Generating full, multi-paragraph API documentation.
Security auditing or correctness guarantees for code.
Use outside Python (e.g., other programming languages) without additional fine-tuning.
Any safety-critical application where incorrect summaries could cause harm.

Bias, Risks, and Limitations

The model can produce incorrect or incomplete summaries, especially for complex or ambiguous functions.
It may imitate noisy or low-quality patterns from the training data (e.g., overly short or cryptic docstrings).
It does not understand project-specific context, invariants, or business logic; outputs should be reviewed by a human developer.

Recommendations

Use the model as an assistive tool, not an authoritative source.
Always review and edit generated docstrings before committing to production code.
For non-Python or highly domain-specific code, consider additional fine-tuning on in-domain examples.

How to Get Started with the Model

Example with 🤗 Transformers and PEFT (LoRA adapter):

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model_id = "meta-llama/Llama-3.2-1B-Instruct"
adapter_id = "Abdul1102/llama32-1b-python-docstrings-qlora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_id)

def make_prompt(code: str) -> str:
    return
        f'Write a one-line Python docstring for this function:\n\n{code}\n\n"""'

code = "def add(a, b):\n    return a + b"
inputs = tokenizer(make_prompt(code), return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=32, do_sample=False)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)

Training Details

Training Data

Dataset: Python subset of CodeSearchNet (Nan-Do/code-search-net-python)
Inputs: code column (full Python function body)
Targets: First non-empty line of docstring
A filtered subset of ~1,000–2,000 examples was used for efficient QLoRA fine-tuning

Training Procedure

Objective: Causal language modeling (predict the docstring continuation)
Method: QLoRA (4-bit quantized base model with LoRA adapters)
Precision: 4-bit quantized weights, bf16 compute
Epochs: 1
Max sequence length: 256–512 tokens

Training Hyperparameters

Learning rate: ~2e-4 (adapter weights only)
Epochs: 1
Optimizer: AdamW via Hugging Face Trainer
LoRA rank: 16
LoRA alpha: 32
LoRA dropout: 0.05

Evaluation

Testing Data, Factors & Metrics

Testing Data

Held-out test split from the same CodeSearchNet Python dataset, using identical code → one-line docstring mapping.

Factors

Function size and complexity
Variety in docstring writing styles
Presence of short or noisy docstrings

Metrics

BLEU (sacreBLEU): strict n-gram overlap, sensitive to paraphrasing
ROUGE (ROUGE-1 / ROUGE-2 / ROUGE-L): better for short summaries

Results

Approximate performance on ~50 held-out samples:

BLEU: ~12.4
ROUGE-1: ~0.78
ROUGE-2: ~0.74
ROUGE-L: ~0.78

Summary

The model frequently reproduces or closely paraphrases the correct docstring. Occasional failures include echoing part of the prompt or returning an empty string. Strong performance for a 1B model trained briefly on a small dataset.

Model Examination

Not applicable.

Environmental Impact

Hardware Type: Google Colab GPU (T4/L4)
Hours Used: ~0.5–1 hour total
Cloud Provider: Google Colab
Compute Region: US
Carbon Emitted: Not estimated (very low due to minimal training time)

Technical Specifications

Model Architecture and Objective

Base model: Llama 3.2 1B Instruct
Architecture: Decoder-only transformer
Objective: Causal language modeling
Parameter-efficient fine-tuning using LoRA (rank 16)

Compute Infrastructure

Hardware

Single Google Colab GPU (T4 or L4)

Software

Python
PyTorch
Hugging Face Transformers
PEFT
bitsandbytes
Datasets

Citation

Not applicable.

Glossary

Not applicable.

More Information

See the Hugging Face model page for updates or usage examples.

Model Card Authors

Abdullah Al-Housni

Model Card Contact

Available through the Hugging Face model repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support