Model Card for llama32-1b-python-docstrings-qlora

A parameter-efficiently fine-tuned adapter on top of meta-llama/Llama-3.2-1B-Instruct for generating concise one-line Python docstrings from function bodies.

Model Details

Model Description

  • Developed by: Abdullah Al-Housni
  • Model type: Causal language model with LoRA/QLoRA adapters
  • Language(s): Python code as input, English docstrings as output
  • License: Same as meta-llama/Llama-3.2-1B-Instruct (Meta Llama 3.2 Community License)
  • Finetuned from model: meta-llama/Llama-3.2-1B-Instruct

The model is trained to take a Python function definition and generate a concise, one-line docstring describing what the function does.

Uses

Direct Use

  • Automatically generate one-line Python docstrings for functions.
  • Improve or bootstrap documentation in Python codebases.
  • Educational use for learning how to summarize code behavior.

Typical usage pattern:

  • Input: Python function body (source code).
  • Output: Single-sentence English description suitable as a docstring.

Out-of-Scope Use

  • Generating full, multi-paragraph API documentation.
  • Security auditing or correctness guarantees for code.
  • Use outside Python (e.g., other programming languages) without additional fine-tuning.
  • Any safety-critical application where incorrect summaries could cause harm.

Bias, Risks, and Limitations

  • The model can produce incorrect or incomplete summaries, especially for complex or ambiguous functions.
  • It may imitate noisy or low-quality patterns from the training data (e.g., overly short or cryptic docstrings).
  • It does not understand project-specific context, invariants, or business logic; outputs should be reviewed by a human developer.

Recommendations

  • Use the model as an assistive tool, not an authoritative source.
  • Always review and edit generated docstrings before committing to production code.
  • For non-Python or highly domain-specific code, consider additional fine-tuning on in-domain examples.

How to Get Started with the Model

Example with πŸ€— Transformers and PEFT (LoRA adapter):

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model_id = "meta-llama/Llama-3.2-1B-Instruct"
adapter_id = "Abdul1102/llama32-1b-python-docstrings-qlora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_id)

def make_prompt(code: str) -> str:
    return
        f'Write a one-line Python docstring for this function:\n\n{code}\n\n"""'

code = "def add(a, b):\n    return a + b"
inputs = tokenizer(make_prompt(code), return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=32, do_sample=False)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)

Training Details

Training Data

  • Dataset: Python subset of CodeSearchNet (Nan-Do/code-search-net-python)
  • Inputs: code column (full Python function body)
  • Targets: First non-empty line of docstring
  • A filtered subset of ~1,000–2,000 examples was used for efficient QLoRA fine-tuning

Training Procedure

  • Objective: Causal language modeling (predict the docstring continuation)
  • Method: QLoRA (4-bit quantized base model with LoRA adapters)
  • Precision: 4-bit quantized weights, bf16 compute
  • Epochs: 1
  • Max sequence length: 256–512 tokens

Training Hyperparameters

  • Learning rate: ~2e-4 (adapter weights only)
  • Epochs: 1
  • Optimizer: AdamW via Hugging Face Trainer
  • LoRA rank: 16
  • LoRA alpha: 32
  • LoRA dropout: 0.05

Evaluation

Testing Data, Factors & Metrics

Testing Data

Held-out test split from the same CodeSearchNet Python dataset, using identical code β†’ one-line docstring mapping.

Factors

  • Function size and complexity
  • Variety in docstring writing styles
  • Presence of short or noisy docstrings

Metrics

  • BLEU (sacreBLEU): strict n-gram overlap, sensitive to paraphrasing
  • ROUGE (ROUGE-1 / ROUGE-2 / ROUGE-L): better for short summaries

Results

Approximate performance on ~50 held-out samples:

  • BLEU: ~12.4
  • ROUGE-1: ~0.78
  • ROUGE-2: ~0.74
  • ROUGE-L: ~0.78

Summary

The model frequently reproduces or closely paraphrases the correct docstring. Occasional failures include echoing part of the prompt or returning an empty string. Strong performance for a 1B model trained briefly on a small dataset.


Model Examination

Not applicable.


Environmental Impact

  • Hardware Type: Google Colab GPU (T4/L4)
  • Hours Used: ~0.5–1 hour total
  • Cloud Provider: Google Colab
  • Compute Region: US
  • Carbon Emitted: Not estimated (very low due to minimal training time)

Technical Specifications

Model Architecture and Objective

  • Base model: Llama 3.2 1B Instruct
  • Architecture: Decoder-only transformer
  • Objective: Causal language modeling
  • Parameter-efficient fine-tuning using LoRA (rank 16)

Compute Infrastructure

Hardware

Single Google Colab GPU (T4 or L4)

Software

  • Python
  • PyTorch
  • Hugging Face Transformers
  • PEFT
  • bitsandbytes
  • Datasets

Citation

Not applicable.


Glossary

Not applicable.


More Information

See the Hugging Face model page for updates or usage examples.


Model Card Authors

Abdullah Al-Housni


Model Card Contact

Available through the Hugging Face model repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support