Finetuning on Phosphosite Sequences with MLM Objective on ESM-1b Architecture
This repository provides a finetuned ESM-1b model on phosphosite sequences, where the weights are initialized pretrained(original ESM-1b) and finetuned using the Masked Language Modeling (MLM) objective. The model was finetuned on long phosphosite-containing peptide sequences derived from PhosphoSitePlus.
Developed by:
Zeynep Işık (MSc, Sabanci University)
Training Details
Architecture: ESM-1b (finetuned) Pretraining Objective: Masked Language Modeling (MLM) Dataset: Unlabeled phosphosites from PhosphoSitePlus Total Samples: 352,453 (10% seperated for validation) Sequence Length: ≤ 128 residues Batch Size: 64 Optimizer: AdamW Learning Rate: default Training Duration: 1.5 day
Pretraining Performance
Perplexity at Start: 5.42 Perplexity at End: 2.27 A significant decrease in perplexity indicates that the model has effectively learned meaningful representations of phosphosite-related sequences.
Potential Usecases
This finetuned model can be used for downstream tasks requiring phosphosite knowledge, such as: ✅ Binary classification of phosphosites ✅ Kinase-specific phosphorylation site prediction ✅ Protein-protein interaction prediction involving phosphosites
- Downloads last month
- 7
Model tree for isikz/esm1b_mlm_ft_phosphosite
Base model
facebook/esm1b_t33_650M_UR50S