AI & ML interests

None defined yet.

Recent Activity

bhavnicksm  updated a model about 5 hours ago
tokiers/stablelm-2-1_6b
bhavnicksm  published a model about 5 hours ago
tokiers/stablelm-2-1_6b
bhavnicksm  updated a model about 5 hours ago
tokiers/SmolLM2-135M
View all activity

Organization Card

tokie

tokiers

Pre-built .tkz tokenizer files for tokie — the fast, correct Rust tokenizer.

crates.io · GitHub


What is tokie?

tokie is a Rust tokenizer library that's a drop-in replacement for HuggingFace tokenizers — 50x faster tokenization, 10x smaller model files, 100% accurate.

It supports BPE (GPT-2, tiktoken, SentencePiece), WordPiece (BERT), and Unigram (T5/XLM-R) encoders, with a custom .tkz binary format that loads in ~5ms.

What's on this org?

This organization hosts pre-built .tkz tokenizer files for popular models. Each repo contains the original model's tokenizer converted to tokie's compact binary format.

Using a model from this org

use tokie::Tokenizer;

// Loads tokenizer.tkz from this org automatically
let tokenizer = Tokenizer::from_pretrained("tokiers/ms-marco-MiniLM-L-6-v2")?;

let tokens = tokenizer.encode("Hello, world!", true);

from_pretrained() tries .tkz first, then falls back to tokenizer.json — so these repos are fully compatible with the standard HuggingFace loading flow.

Links

datasets 0

None public yet