dnagpt/human_genome_GCF_009914755.1
Viewer • Updated • 999k • 307 • 4
How to use dnagpt/human_gpt2-v1 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("feature-extraction", model="dnagpt/human_gpt2-v1") # Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("dnagpt/human_gpt2-v1")
model = AutoModel.from_pretrained("dnagpt/human_gpt2-v1")dna language model trained using gpt2. using human genome data.
Key features of our dangpt models:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('dnagpt/human_gpt2-v1')
tokenizer.tokenize("GAGCACATTCGCCTGCGTGCGCACTCACACACACGTTCAAAAAGAGTCCATTCGATTCTGGCAGTAG")
#result: [G','AGCAC','ATTCGCC',....]
model = AutoModel.from_pretrained('dnagpt/human_gpt2-v1')
import torch
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
hidden_states = model(inputs)[0] # [1, sequence_length, 768]
# embedding with mean pooling
embedding_mean = torch.mean(hidden_states[0], dim=0)
print(embedding_mean.shape) # expect to be 768
# embedding with max pooling
embedding_max = torch.max(hidden_states[0], dim=0)[0]
print(embedding_max.shape) # expect to be 768