| --- |
| license: mit |
| pipeline_tag: feature-extraction |
| tags: |
| - biology |
| - Gene |
| - Protein |
| - GO |
| - MLM |
| - Gene function |
| - Gene Ontology |
| - DAG |
| - Protein function |
| --- |
| |
| ## Model Details |
| GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction. |
|
|
| ### Model Description |
| First encoder to capture relations among GO functions. Could generate GO function embedding for various biological applications that related to gene or gene products. For the Gene-GO function mapping database, please refer to our previous work UniEtnrezDB (UniEntrezGOA.zip at https://zenodo.org/records/13335548) |
|
|
|
|
|
|
| ### Model Sources |
|
|
| <!-- Provide the basic links for the model. --> |
|
|
| - **Repository:** https://github.com/MM-YY-WW/GoBERT |
| - **Paper:** GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction. (AAAI-25) |
| - **Demo:** https://gobert.nasy.moe/ |
|
|
| ## How to Get Started with the Model |
|
|
| Use the code below to get started with the model. |
|
|
| ```python |
| from transformers import AutoTokenizer, BertForPreTraining |
| import torch |
| |
| repo_name = "MM-YY-WW/GoBERT" |
| tokenizer = AutoTokenizer.from_pretrained(repo_name, use_fast=False, trust_remote_code=True) |
| model = BertForPreTraining.from_pretrained(repo_name) |
| |
| # Obtain function-level GoBERT Embedding: |
| input_sequences = 'GO:0005739 GO:0005783 GO:0005829 GO:0006914 GO:0006915 GO:0006979 GO:0031966 GO:0051560' |
| tokenized_input = tokenizer(input_sequences) |
| input_tensor = torch.tensor(tokenized_input['input_ids']).unsqueeze(0) |
| attention_mask = torch.tensor(tokenized_input['attention_mask']).unsqueeze(0) |
| |
| model.eval() |
| with torch.no_grad(): |
| outputs = model(input_ids=input_tensor, attention_mask=attention_mask, output_hidden_states=True) |
| embedding = outputs.hidden_states[-1].squeeze(0).cpu().numpy() |
| ``` |
|
|
| ## Citation |
|
|
| <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
| **BibTeX:** |
|
|
| ```bibtex |
| @inproceedings{miao2025gobert, |
| title={GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction}, |
| author={Miao, Yuwei and Guo, Yuzhi and Ma, Hehuan and Yan, Jingquan and Jiang, Feng and Liao, Rui and Huang, Junzhou}, |
| booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, |
| volume={39}, |
| number={1}, |
| pages={622--630}, |
| year={2025}, |
| doi={10.1609/aaai.v39i1.32043} |
| } |
| ``` |