Configuration Parsing Warning: In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

Kimi-K2-Thinking Tokenizer

Tokenizer files for moonshotai/Kimi-K2-Thinking - a trillion-parameter thinking model.

πŸ“¦ What's Included

  • tiktoken.model - Original tiktoken tokenizer (2.8MB)
  • tokenizer.json - HuggingFace compatible format
  • tokenization_kimi.py - Custom tokenization code
  • tokenizer_config.json - Configuration
  • special_tokens_map.json - Special tokens
  • chat_template.jinja - Chat template

πŸš€ Quick Start

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "Zaynoid/Kimi-K2-Thinking-Tokenizer",
    trust_remote_code=True
)

text = "Hello, how are you?"
tokens = tokenizer.encode(text)
print(f"Tokens: {tokens}")
print(f"Decoded: {tokenizer.decode(tokens)}")

πŸ“Š Specifications

  • Vocab Size: 163,840 tokens
  • Model Type: Tiktoken (BPE-based)
  • Context Length: 256K tokens
  • Special Tokens:
    • BOS: <|begin▁of▁sentence|> (ID: 163584)
    • EOS: <|end▁of▁sentence|> (ID: 163585)

πŸ’‘ Usage Notes

  1. Recommended: Use with trust_remote_code=True for full compatibility
  2. The tokenizer.json is provided for tools that require it
  3. Original model uses tiktoken format natively

πŸ”§ Use with vLLM

from vllm import LLM

llm = LLM(
    model="moonshotai/Kimi-K2-Thinking",
    tokenizer="Zaynoid/Kimi-K2-Thinking-Tokenizer",
    trust_remote_code=True
)

πŸ“ License

Modified MIT License (same as base model)

πŸ™ Credits

  • Original Model: Moonshot AI
  • Architecture: Based on DeepSeek-V3
  • Tokenizer Extraction: Zaynoid

πŸ”— Links

Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Zaynoid/Kimi-K2-Thinking-Tokenizer

Finetuned
(10)
this model