Instructions to use TilQazyna/Til-Core-1B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TilQazyna/Til-Core-1B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TilQazyna/Til-Core-1B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("TilQazyna/Til-Core-1B-Instruct")
model = AutoModelForMultimodalLM.from_pretrained("TilQazyna/Til-Core-1B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use TilQazyna/Til-Core-1B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TilQazyna/Til-Core-1B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TilQazyna/Til-Core-1B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/TilQazyna/Til-Core-1B-Instruct

SGLang

How to use TilQazyna/Til-Core-1B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TilQazyna/Til-Core-1B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TilQazyna/Til-Core-1B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TilQazyna/Til-Core-1B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TilQazyna/Til-Core-1B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use TilQazyna/Til-Core-1B-Instruct with Docker Model Runner:
```
docker model run hf.co/TilQazyna/Til-Core-1B-Instruct
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Til Core 1B Instruct

Chat/instruct version of TilQazyna/Til-Core-1B, supervised-fine-tuned on native-Kazakh instruction–response pairs (ChatML format, assistant-only loss). No translated data, no eval-set contamination.

⚠️ Early v1 / research preview. Follows the chat format and answers in Kazakh, but factual accuracy is limited (1.25B params, small SFT set). Not for production or factual reliance.

Details


Base	Til-Core-1B (1.246B, morphbpe-256k)
SFT data	AmanMussa/kazakh-instruction-v2 — 52 173 native-kk Alpaca-style pairs
Format	ChatML (`<
Loss	assistant tokens only
Recipe	3 epochs, LR 1e-5 cosine, bf16, 8×H200 FSDP
Stop token	`<

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

name = "TilQazyna/Til-Core-1B-Instruct"
tok = AutoTokenizer.from_pretrained(name)
m = AutoModelForCausalLM.from_pretrained(name, dtype=torch.bfloat16).cuda().eval()

msg = [{"role": "user", "content": "Денсаулықты сақтаудың үш кеңесін айт."}]
p = tok.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
ids = tok(p, add_special_tokens=False, return_tensors="pt").input_ids.cuda()
out = m.generate(ids, max_new_tokens=160, do_sample=True, temperature=0.7,
                 top_p=0.9, repetition_penalty=1.2,
                 eos_token_id=tok.convert_tokens_to_ids("<|im_end|>"))
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))

Example

User: Қазақстанның астанасы қай қала және ол туралы қысқаша айт. Assistant: Қазақстанның елордасы — Астана қаласы. Ол Есіл өзенінің жағасында орналасқан…

User: Денсаулықты сақтаудың үш кеңесін айт. Assistant: 1. Салауатты өмір салтын ұстану; 2. Дұрыс тамақтану; 3. Тұрақты дене жаттығулары…

Limitations

Small model + small SFT set → weak factual accuracy, occasional topic drift.
No RLHF / safety alignment.
Kazakh-only.

Roadmap

Larger / cleaner SFT set, preference tuning.
A smaller on-device instruct sibling.
Task-specialized variants (e.g. Kazakh grammar correction — see Til-Core experiments).

Downloads last month: 22

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for TilQazyna/Til-Core-1B-Instruct

Base model

TilQazyna/Til-Core-1B

Finetuned

(1)

this model