Configuration Parsing
Warning:
In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string
Kimi-K2-Thinking Tokenizer
Tokenizer files for moonshotai/Kimi-K2-Thinking - a trillion-parameter thinking model.
π¦ What's Included
tiktoken.model- Original tiktoken tokenizer (2.8MB)tokenizer.json- HuggingFace compatible formattokenization_kimi.py- Custom tokenization codetokenizer_config.json- Configurationspecial_tokens_map.json- Special tokenschat_template.jinja- Chat template
π Quick Start
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"Zaynoid/Kimi-K2-Thinking-Tokenizer",
trust_remote_code=True
)
text = "Hello, how are you?"
tokens = tokenizer.encode(text)
print(f"Tokens: {tokens}")
print(f"Decoded: {tokenizer.decode(tokens)}")
π Specifications
- Vocab Size: 163,840 tokens
- Model Type: Tiktoken (BPE-based)
- Context Length: 256K tokens
- Special Tokens:
- BOS:
<ο½beginβofβsentenceο½>(ID: 163584) - EOS:
<ο½endβofβsentenceο½>(ID: 163585)
- BOS:
π‘ Usage Notes
- Recommended: Use with
trust_remote_code=Truefor full compatibility - The
tokenizer.jsonis provided for tools that require it - Original model uses tiktoken format natively
π§ Use with vLLM
from vllm import LLM
llm = LLM(
model="moonshotai/Kimi-K2-Thinking",
tokenizer="Zaynoid/Kimi-K2-Thinking-Tokenizer",
trust_remote_code=True
)
π License
Modified MIT License (same as base model)
π Credits
- Original Model: Moonshot AI
- Architecture: Based on DeepSeek-V3
- Tokenizer Extraction: Zaynoid
π Links
- Downloads last month
- 30
Model tree for Zaynoid/Kimi-K2-Thinking-Tokenizer
Base model
moonshotai/Kimi-K2-Thinking