🌸 SEA Model series Op.0: Saint Iberis d16 (Parameters: 376M) 日本語・英語バイリンガルモデル
このリポジトリはnanochatをより高速に学習・推論するために、LTCsおよびLFM2から着想を得たSLC2というモジュールを使用しています。 SEA Model series Op.0: Saint Iberisは元のnanoGPTと比較して学習時間を削減しながら、同等の性能を達成することが可能です。 モデルは下記リポジトリからご自由に利用できます。
Ripository: Liquid_Time_nanochat_jp
🌸 Saint Iberis Architecture
| Property | Saint Iberis d12 | Remarks |
|---|---|---|
| Total parameters | 376,240,128 (376M) | n_layer: 16, n_head: 16, n_kv_head: 16, n_embd: 1024 |
| Layers | 16 (9 slc2 + 7 attn) | attn layers: 1, 4, 7, 10, 11, 14, 15 |
| Vocabulary size | 65,536 | - |
| License | Apache | - |
🌸 SLC2 Formulation
y = B ⋅ ∏ᵢ₌ⱼ⁽ʲ⁺ᵏ⁾ Aᵢ ⋅ xᵢ
🌸 SLC2 pseudo code
----------------------------------------
Algorithm: SLC2
----------------------------------------
Input: x: (B, S, E)
Output: y: (B, S, E)
1: alpha, A, B, x₁ <- Linear(x)
2: x₂: (B, S, E) <- Convolution1D(E, E)(SiLU(alpha)*A*x₁)
3: x₃: (B, S, E) <- B*SiLU(x₂)
4: y: (B, S, E) <- Linear(x₃)
5: return y
----------------------------------------
🌸 Performance
| Metric | BASE | MID | SFT | RL |
|---|---|---|---|---|
| CORE | 0.1501 | - | - | - |
| ARC-Challenge | - | 0.2491 | 0.2807 | - |
| ARC-Easy | - | 0.2563 | 0.2673 | - |
| GSM8K | - | 0.0167 | 0.0250 | - |
| HumanEval | - | 0.0305 | 0.0122 | - |
| MMLU | - | 0.2714 | 0.2735 | - |
| ChatCORE | - | 0.1785 | 0.1875 | - |
🌸 Training result
Base Training
- Minimum validation bpb: 0.8436
- Final validation bpb: 0.8436
Mid Training
- Minimum validation bpb: 0.4561
SFT Training
- Training loss: 1.3444
- Validation loss: 1.1934
🌸 Usage
install the ripository:
git clone https://github.com/Rikka-Botan/Liquid_Time_nanochat_jp.git
Then, you can run this inference snippet:
import os
import sys
import torch
import json
import time
from huggingface_hub import hf_hub_download
if not os.path.exists("Liquid_Time_nanochat_jp"):
os.system("git clone https://github.com/Rikka-Botan/Liquid_Time_nanochat_jp")
os.chdir("Liquid_Time_nanochat_jp")
sys.path.append(os.getcwd())
from nanochat.gpt import GPT, GPTConfig
from nanochat.tokenizer import RustBPETokenizer
repo_id = "RikkaBotan/nanochat_saint_iberis_jp"
model_file = "model_000825.pt"
meta_file = "meta_000825.json"
tokenizer_file = "tokenizer.pkl"
local_pt_path = hf_hub_download(repo_id=repo_id, filename=model_file)
local_meta_path = hf_hub_download(repo_id=repo_id, filename=meta_file)
local_tokenizer_path = hf_hub_download(repo_id=repo_id, filename=tokenizer_file, local_dir=os.getcwd())
with open(local_meta_path, "r", encoding="utf-8") as f:
meta_data = json.load(f)
model_config = GPTConfig(**meta_data["model_config"])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GPT(model_config).to(device)
state_dict = torch.load(local_pt_path, map_location=device)
state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
model.load_state_dict(state_dict, strict=True)
model.eval()
tokenizer = RustBPETokenizer.from_directory(os.getcwd())
try:
tokenizer.bos_token_id = tokenizer.enc.encode_single_token("<|bos|>")
except KeyError:
tokenizer.bos_token_id = tokenizer.enc.encode_single_token("<|endoftext|>")
tokenizer.user_start_id = tokenizer.enc.encode_single_token("<|user_start|>")
tokenizer.user_end_id = tokenizer.enc.encode_single_token("<|user_end|>")
tokenizer.assistant_start_id = tokenizer.enc.encode_single_token("<|assistant_start|>")
tokenizer.assistant_end_id = tokenizer.enc.encode_single_token("<|assistant_end|>")
tokenizer.stop_tokens = {tokenizer.assistant_end_id, tokenizer.bos_token_id}
def format_conversation(tokenizer, history):
tokens = [tokenizer.bos_token_id]
for message in history:
role = message["role"]
content = message["content"]
content_tokens = tokenizer.encode(content)
if role == "user":
tokens.extend([tokenizer.user_start_id, *content_tokens, tokenizer.user_end_id])
elif role == "assistant":
tokens.extend([tokenizer.assistant_start_id, *content_tokens, tokenizer.assistant_end_id])
tokens.append(tokenizer.assistant_start_id)
return tokens
def generate_reply(prompt, conv_history, temperature=0.7, top_k=20, top_p=0.8,
repetition_penalty=1.15, max_new_tokens=64):
conv_history.append({"role": "user", "content": prompt})
tokens = format_conversation(tokenizer, conv_history)
input_ids = torch.tensor(tokens, dtype=torch.long).unsqueeze(0).to(device)
stream = model.generate(
input_ids,
max_new_tokens=max_new_tokens,
temperature=temperature,
top_k=top_k,
top_p=top_p,
repetition_penalty=repetition_penalty,
)
buffer_text = ""
for token_id in stream:
text_piece = tokenizer.decode([token_id])
if text_piece == "<|assistant_end|>":
break
buffer_text += text_piece
conv_history.append({"role": "assistant", "content": buffer_text})
return buffer_text
if __name__ == "__main__":
print("🌸 NanoChat - Saint Iberis CLI")
print("Type 'exit' to quit.\n")
conv_history = []
while True:
prompt = input("You: ")
if prompt.lower() in {"exit", "quit"}:
print("Goodbye!")
break
reply = generate_reply(prompt, conv_history)
print(f"AI: {reply}\n")
🌸 Acknowledgments
I thank Andrej Karpathy's fullstack llm project to build an LLM, nanochat.
I thank the developers of python and pytorch.
I thank all the researchers for their efforts to date.
I thank Japan's high standard of education.
And most of all, thank you for your interest in this repository.
🌸 About us
Japanese independent researcher having shy and pampered personality. Twin-tail hair is a charm point. Interested in nlp. Usually using python and C.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Datasets used to train RikkaBotan/nanochat_saint_iberis_jp
Space using RikkaBotan/nanochat_saint_iberis_jp 1
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set nanochat28.070
- normalized accuracy on AI2 Reasoning Easy (25-Shot)test set nanochat26.730
- accuracy on MMLU (5-Shot)test set nanochat27.350
- accuracy on GSM8k (5-shot)test set nanochat2.500
- pass@1 on HumanEvaltest set nanochat1.220
- ChatCORE metric on ChatCOREtest set nanochat18.750


