Instructions to use latincy/latin-bert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use latincy/latin-bert with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="latincy/latin-bert")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("latincy/latin-bert") model = AutoModel.from_pretrained("latincy/latin-bert") - Notebooks
- Google Colab
- Kaggle
do_lower_case=True not seeming to work
#2
by thiagotps - opened
I'm testing version v1.1.1 with the following code
_tokenizer = AutoTokenizer.from_pretrained(
"latincy/latin-bert", revision="v1.1.1", trust_remote_code=True, do_lower_case=True
)
_tokens = _tokenizer("Gallia est omnis divisa in partes tres.", return_tensors='pt')
_token_ids = _tokens['input_ids'][0]
_token_texts = _tokenizer.convert_ids_to_tokens(_token_ids)
_token_texts
and the result is
[
"[CLS]",
"\\",
"71",
";",
"allia",
"_",
"\\",
"32",
";_",
"est_",
"\\",
"32",
";_",
"omnis_",
"\\",
"32",
";_",
"divisa_",
"\\",
"32",
";_",
"in_",
"\\",
"32",
";_",
"partes_",
"\\",
"32",
";_",
"tres_",
"._",
"[SEP]"
]
It seems like the lower() method is still not being applied internally because the capital G in Gallia was escaped by the tokenizer.
Thank you for posting the Issue—I have been able to replicate this behavior. This turned out to be a packaging error not a code/model error, so I am going to force-update the v1.1.1 tag. The original snippet should now work (even if you do not specifically invoke do_lower_case=True; it is the config default.). Let me know if this works on your end and again thanks for the report.
diyclassics changed discussion status to closed