Quark-270M-Instruct โ€” Bilingual Chat Model

Quark-270M-Instruct is the instruction-tuned version of Quark-270M Base, fine-tuned for conversational use in Italian and English. Built entirely from scratch by ThingsAI.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "ThingAI/Quark-270m-Instruct",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
).cuda()
model.lm_head.weight = model.embed_tokens.weight  # ensure weight tying

tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-270m-Instruct")

prompt = "<|user|>\nCiao, come stai?\n<|end|>\n<|assistant|>\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=150, do_sample=True, temperature=0.7, top_k=40)
print(tokenizer.decode(out[0], skip_special_tokens=False))

Chat Format

<|user|>
{user message}
<|end|>
<|assistant|>
{model response}
<|end|>

Multi-turn:

<|user|>
Ciao!
<|end|>
<|assistant|>
Ciao! Come posso aiutarti?
<|end|>
<|user|>
Cos'รจ l'intelligenza artificiale?
<|end|>
<|assistant|>

Model Details

Base Model Quark-270M Base
Parameters 252M (with weight tying)
Architecture Decoder-only Transformer (GQA, SwiGLU, RMSNorm, RoPE)
Vocabulary 65,537 tokens
Context Length 2,048 tokens
Precision BF16
Languages Italian, English

Architecture

d_model 768
Layers 32
Query Heads 12
KV Heads 4
Head Dim 64
FFN Dim 2,048
Activation SwiGLU

Training

Base Pretraining

~10B tokens on a bilingual mix (Italian 50%, English 43%, Code 7%) on NVIDIA B200. See Quark-270M Base for details.

SFT (Instruction Tuning)

Fine-tuned on a diverse mix of conversational and instructional data:

Dataset Examples Type
FreedomIntelligence/alpaca-gpt4-italian ~52,000 Italian instructions
HuggingFaceH4/no_robots ~9,500 English conversations
m-a-p/CodeFeedback-Filtered-Instruction 5,000 Code instructions
yogeshm/text_to_bash (ร—80) ~9,900 Terminal commands
Custom chitchat (ร—100) ~3,000 Identity, greetings, basic Q&A
Total ~80,000
Hardware NVIDIA B200
Epochs 3
Learning Rate 2e-5 (cosine decay)
Batch Size 16 ร— 4 = 64 effective
Sequence Length 512

Inference Server

Quark-270M-Instruct powers Things Chat via a self-hosted FastAPI server with SSE streaming, conversation memory, web search, and content moderation.

Limitations

  • 252M is small: Limited factual knowledge, prone to hallucination
  • Mathematics: Unreliable beyond basic arithmetic
  • Code: Generates plausible but often non-functional code
  • Context: 2,048 token window
  • No system prompt: The model was not trained with <|system|> tags

Good for

  • Self-hosted bilingual chatbot
  • Learning about LLM training from scratch
  • Terminal command assistance
  • Light conversational AI

Not suited for

  • Factual Q&A requiring accuracy
  • Complex reasoning or math
  • Production-grade code generation
  • Safety-critical applications

The Quark Family

Model Parameters Type
Quark-50M 51M Base
Quark-135M 135M Base
Quark-270M Base 252M Base
Quark-270M-Instruct 252M Chat

Links


Built from scratch by ThingsAI ๐Ÿ‡ฎ๐Ÿ‡น

Downloads last month
104
Safetensors
Model size
0.3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using ThingAI/Quark-270m-Instruct 1

Collection including ThingAI/Quark-270m-Instruct