Tokie
community
AI & ML interests
None defined yet.
Recent Activity
Organization Card
tokiers
Pre-built .tkz tokenizer files for tokie — the fast, correct Rust tokenizer.
What is tokie?
tokie is a Rust tokenizer library that's a drop-in replacement for HuggingFace tokenizers — 50x faster tokenization, 10x smaller model files, 100% accurate.
It supports BPE (GPT-2, tiktoken, SentencePiece), WordPiece (BERT), and Unigram (T5/XLM-R) encoders, with a custom .tkz binary format that loads in ~5ms.
What's on this org?
This organization hosts pre-built .tkz tokenizer files for popular models. Each repo contains the original model's tokenizer converted to tokie's compact binary format.
Using a model from this org
use tokie::Tokenizer;
// Loads tokenizer.tkz from this org automatically
let tokenizer = Tokenizer::from_pretrained("tokiers/ms-marco-MiniLM-L-6-v2")?;
let tokens = tokenizer.encode("Hello, world!", true);
from_pretrained() tries .tkz first, then falls back to tokenizer.json — so these repos are fully compatible with the standard HuggingFace loading flow.
Links
- GitHub: chonkie-inc/tokie
- crates.io: tokie
- Built by: chonkie-inc
datasets 0
None public yet