Model Card for ProtGPT3-112M-dpo

Model Description

ProtGPT3-112M-dpo is the DPO-aligned version of ProtGPT3-112M. It is part of the ProtGPT3 family, an open-source suite of promptable and aligned protein language models for protein design.

ProtGPT3-112M-dpo was further aligned with Direct Preference Optimization (DPO) to improve generation quality. The alignment procedure shifts the model toward protein sequences with higher predicted structural confidence and reduced low-complexity content, while preserving sequence diversity. For protein generation, each model -dpo version is recommended over the base model.

For more info and guidance on how to generate sequences with ProtGPT3-112M-dpo check out the extensive description provided in ProtGPT3-1.3B, just replacing the model name (i.e., model_name=AI4PD/ProtGPT3-112M-dpo).

Out-of-Scope Use

The model should not be used as the sole basis for experimental, clinical, environmental, or safety-critical decisions. Generated sequences require downstream computational and experimental validation. The model is not guaranteed to generate functional, soluble, safe, synthesizable, or experimentally successful proteins.

The model should not be used for irresponsible or harmful biological design applications.

Bias, Risks, and Limitations

ProtGPT3-112M-dpo learns from public protein sequence datasets and may reproduce biases present in those datasets. Although DPO alignment reduces low-complexity generations and improves generation quality according to the alignment objectives (pLDDT and reduction of lcr, as a binary objective, see main manuscript), generated sequences may still be nonfunctional, unstable, insoluble, repetitive, biologically implausible, or unsuitable for a user’s intended application.

The DPO alignment objective uses predicted structural confidence and low-complexity filtering as proxy objectives. These proxies do not guarantee biological function, experimental success, safety, solubility, or manufacturability.

As with other generative protein models, ProtGPT3-112M-dpo may present dual-use risks if applied irresponsibly.

Citation

BibTeX:

@article{protgpt3,
  title={ProtGPT3: an Open-source family of Promptable and Aligned Protein Language Models},
  author={Anonymous Authors},
  year={2026}
}

More Information

All models and code are released through the Hugging Face ecosystem and accompanying code repository.

Downloads last month: 88

Safetensors

Model size

0.1B params

Tensor type

BF16

Collection including AI4PD/ProtGPT3-112M-dpo

ProtGPT3 Family

Collection

7 items • Updated 6 days ago • 3