Instructions to use LLM-OS-Models/KoHRM-Text-1.4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LLM-OS-Models/KoHRM-Text-1.4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LLM-OS-Models/KoHRM-Text-1.4B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")
model = AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LLM-OS-Models/KoHRM-Text-1.4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LLM-OS-Models/KoHRM-Text-1.4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/KoHRM-Text-1.4B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/LLM-OS-Models/KoHRM-Text-1.4B

SGLang

How to use LLM-OS-Models/KoHRM-Text-1.4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LLM-OS-Models/KoHRM-Text-1.4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/KoHRM-Text-1.4B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LLM-OS-Models/KoHRM-Text-1.4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/KoHRM-Text-1.4B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use LLM-OS-Models/KoHRM-Text-1.4B with Docker Model Runner:
```
docker model run hf.co/LLM-OS-Models/KoHRM-Text-1.4B
```

KoHRM-Text-1.4B / README.md

gyung

Fix formatting token rendering in model card

7ef9fc1 verified 33 minutes ago

preview code

raw

history blame contribute delete

20.3 kB

metadata

license: other
language:
  - ko
  - en
tags:
  - hrm-text
  - korean
  - terminal
  - tool-use
  - code
  - pretraining
  - prefix-lm
library_name: transformers

KoHRM-Text-1.4B

Language / 언어: English | 한국어

English

KoHRM-Text-1.4B is a scratch-pretrained Korean/English/code/terminal/tool-use model built from the sapientinc/HRM-Text PrefixLM training stack.

This is not a continued finetune of sapientinc/HRM-Text-1B. It uses a new Korean/terminal-oriented 131K byte-level BPE tokenizer and a new scratch training run.

Current Status

This repository is a rolling latest public model export. Training is still in progress.

Main repo: LLM-OS-Models/KoHRM-Text-1.4B
Current public files: model.safetensors, config.json, tokenizer files, and this README.md
Raw FSDP2 resume checkpoints: LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints
Prepared data: LLM-OS-Models/KoHRM-Text-1.4B-prepared-data
Project code: https://github.com/LLM-OS-Models/KoHRM-text
Upstream HRM-Text code: https://github.com/sapientinc/HRM-Text
HRM-Text paper: https://arxiv.org/html/2605.20613
Tokenizer repo: LLM-OS-Models/HRM-Text-Ko-Terminal-Tokenizer-131K

The main branch is overwritten with the newest converted EMA safetensors export as training checkpoints are uploaded. To test the latest public weight, download revision="main".

Important Compatibility Note

The public repo currently contains the converted model weights and tokenizer, but it does not yet include a Hugging Face trust_remote_code modeling implementation for HrmTextForCausalLM.

What works today:

Download the latest public weights.
Load the tokenizer with AutoTokenizer.
Inspect config.json.
Verify model.safetensors on CPU or Colab T4.

What is not supported yet in plain Transformers:

AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")
One-line hosted text generation from this repo

Expected reason: model_type: "hrm_text" is a custom HRM-Text architecture. Public generation will require adding the compatible HrmTextForCausalLM remote-code files to this model repo or releasing a standard wrapper.

Model Details

Field	Value
Model id	`LLM-OS-Models/KoHRM-Text-1.4B`
Standard name	`KoHRM-Text-1.4B`
Training origin	scratch
Architecture family	HRM-Text PrefixLM
Architecture size	`XL`
Parameters	1,384,120,320
Context length	4,096 tokens
Training dtype	bfloat16
Public export dtype	bfloat16 EMA `safetensors`
Tokenizer	byte-level BPE, NFC normalization
Vocabulary size	131,072
Objective	PrefixLM response-only loss
Optimizer	Adam-atan2 from upstream HRM-Text
EMA	0.9999

Converted config highlights:

{
  "model_type": "hrm_text",
  "architectures": ["HrmTextForCausalLM"],
  "vocab_size": 131072,
  "hidden_size": 1536,
  "num_hidden_layers": 32,
  "num_attention_heads": 12,
  "max_position_embeddings": 4096,
  "prefix_lm": true
}

Tokenizer

The tokenizer was trained for Korean, English, code, shell/terminal text, and JSON/tool-call formats. It keeps common chat/tool special tokens as stable single tokens where possible.

Sample bucket	chars/token
Korean general text	2.60
Korean legal text	2.36
Korean terminal instruction	2.18
shell command	2.68
tool-call JSON	3.32
Python code	3.37
English	4.40

Formatting tokens:

<|im_start|>         instruction start
<|im_end|>           instruction end
<|box_end|>          response/end marker
<|object_ref_start|> direct condition
<|object_ref_end|>   chain-of-thought style condition
<|quad_start|>       noisy condition
<|quad_end|>         synthetic condition

Prompt format used by the project-side inference code:

<|im_start|><|object_ref_start|>YOUR_PROMPT_HERE<|im_end|>

CPU / Colab T4 Quick Test

Use this to test the latest public weight files on CPU or a Colab T4 runtime. This verifies that the tokenizer, config, and model.safetensors are downloadable and readable.

It does not run text generation yet, because the public repo does not yet ship the custom HRM-Text modeling wrapper.

!pip -q install -U huggingface_hub transformers safetensors accelerate

from pathlib import Path
import json
import torch
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer
from safetensors.torch import load_file

repo_id = "LLM-OS-Models/KoHRM-Text-1.4B"

repo_dir = Path(snapshot_download(
    repo_id,
    revision="main",
    allow_patterns=[
        "README.md",
        "config.json",
        "tokenizer.json",
        "tokenizer_config.json",
        "special_tokens_map.json",
        "model.safetensors",
    ],
))

print("Downloaded to:", repo_dir)
print("Runtime:", "cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))

config = json.loads((repo_dir / "config.json").read_text())
print("model_type:", config["model_type"])
print("hidden_size:", config["hidden_size"])
print("vocab_size:", config["vocab_size"])
print("context:", config["max_position_embeddings"])

tokenizer = AutoTokenizer.from_pretrained(repo_dir, use_fast=True)
prompt = "<|im_start|><|object_ref_start|>한국어로 현재 디렉터리에서 가장 큰 파일 10개를 찾는 명령을 알려주세요.<|im_end|>"
ids = tokenizer(prompt, add_special_tokens=False)["input_ids"]
print("prompt tokens:", len(ids))
print("first token ids:", ids[:20])

# CPU weight integrity check. This loads about 2.8GB of bf16 weights into CPU RAM.
state = load_file(str(repo_dir / "model.safetensors"), device="cpu")
num_tensors = len(state)
num_params = sum(t.numel() for t in state.values())
first_key = next(iter(state))

print("num_tensors:", num_tensors)
print("num_params:", f"{num_params:,}")
print("first tensor:", first_key, tuple(state[first_key].shape), state[first_key].dtype)

Expected result:

model_type should be hrm_text.
vocab_size should be 131072.
num_params should be around 1.38B.
Tokenizer loading should work on CPU and Colab T4.
AutoModelForCausalLM generation is expected to be unavailable until remote-code support is added.

If you try this:

from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")

and it fails with an unknown hrm_text architecture, that is expected for the current public export.

Internal / Project-Side Generation

For internal evaluation, use the project code and raw FSDP2 checkpoints:

git clone https://github.com/LLM-OS-Models/KoHRM-text
cd KoHRM-text
pip install -r requirements.txt

Then load a raw checkpoint with simple_inference_engine.py. This path requires the raw checkpoint files and a CUDA environment with enough VRAM. It is not the recommended Colab T4 path yet.

Training Data

Prepared data artifacts are uploaded to:

https://huggingface.co/datasets/LLM-OS-Models/KoHRM-Text-1.4B-prepared-data

The training objective is PrefixLM response-only loss. Instruction/prompt tokens are visible as context, while loss is applied to the response span.

Major prepared data groups:

Dataset group	Tokens	Use
`koterm_pretrain_mix_v1`	711.3M	stage-0/stage0b
HRM cleaned fast-cap stage1/stage1b	14.55B	HRM-style instruction pretraining
HRM cleaned full/no-cap stage2	14.55B	completed continuation
HRM cleaned full/no-cap extra stage2b	14.55B	scheduled continuation
Local terminal conversations	9.39B	terminal/code/tool-heavy continuation
Korean tool/legal/wiki/finance mix	3.02B	Korean domain and tool continuation
BCAI Finance Korean	857.7M	Korean finance/domain data
Korean legal/admin task data	629.0M	Korean legal/admin data
Korean Wikipedia	462.5M	Korean general text
ToolBench train tool-call data	127.0M	tool-call pretraining
SWE-ZERO + GLM reasoning subsets	251.2M	code/reasoning data

Evaluation-like datasets are excluded where identified, including ToolBench eval, Terminal Bench style evaluation data, and benchmark-oriented chi-bench data.

Training Run

The current run uses staged continuation:

stage0
-> stage0b
-> stage1
-> stage2
-> stage3
-> stage4
-> stage1b
-> stage2b
-> stage3b
-> stage4b

The checkpoint carries model weights, optimizer state, EMA weights, and recurrent carry state. resume_step_offset and total_steps_override are used so the learning-rate schedule follows the intended longer run instead of resetting at each stage.

As of 2026-05-27, stage1b is active and stage2b -> stage3b -> stage4b are scheduled through a handoff watcher. The handoff reads the actual epoch_1_info.json global_step from each completed checkpoint before starting the next stage.

Intended Use

This checkpoint is intended for:

continued pretraining experiments
Korean tokenizer and HRM-Text architecture experiments
terminal/tool-call/code pretraining research
checkpoint conversion and evaluation work

It is not yet intended as a finished assistant model.

Limitations

This is an intermediate checkpoint, not a final aligned instruct model.
The full planned continuation has not finished.
Final SFT and safety tuning have not been completed.
Public benchmark scores for this new checkpoint are not final.
Plain Transformers generation requires adding the custom hrm_text modeling wrapper or remote-code files.
Tool-call JSON validity and terminal action safety must be evaluated before production use.

Citation

This work builds on HRM-Text:

Paper: https://arxiv.org/html/2605.20613
Upstream code: https://github.com/sapientinc/HRM-Text

한국어

KoHRM-Text-1.4B는 sapientinc/HRM-Text의 PrefixLM 학습 스택을 기반으로 처음부터 학습 중인 한국어/영어/코드/터미널/툴콜 모델입니다.

이 모델은 sapientinc/HRM-Text-1B를 이어서 파인튜닝한 모델이 아닙니다. 한국어와 터미널/툴콜 형식에 맞춰 새로 만든 131K byte-level BPE tokenizer를 사용하며, 가중치도 scratch pretraining으로 학습합니다.

현재 상태

이 저장소는 최신 공개 변환본을 계속 덮어쓰는 rolling latest model repo입니다. 학습은 아직 진행 중입니다.

메인 모델 repo: LLM-OS-Models/KoHRM-Text-1.4B
현재 공개 파일: model.safetensors, config.json, tokenizer 파일, README.md
raw FSDP2 resume checkpoint: LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints
prepared data: LLM-OS-Models/KoHRM-Text-1.4B-prepared-data
프로젝트 코드: https://github.com/LLM-OS-Models/KoHRM-text
원본 HRM-Text 코드: https://github.com/sapientinc/HRM-Text
HRM-Text 논문: https://arxiv.org/html/2605.20613
tokenizer repo: LLM-OS-Models/HRM-Text-Ko-Terminal-Tokenizer-131K

최신 공개 weight를 테스트하려면 revision="main"으로 다운로드하면 됩니다. 학습 중 10,000 step 단위로 새 checkpoint가 변환되어 올라오면 같은 파일명이 최신 EMA safetensors로 갱신됩니다.

중요한 호환성 안내

현재 공개 repo에는 변환된 model weight와 tokenizer가 있지만, 아직 Hugging Face trust_remote_code용 HrmTextForCausalLM 구현 파일은 포함되어 있지 않습니다.

현재 바로 가능한 것:

최신 공개 weight 다운로드
AutoTokenizer로 tokenizer 로드
config.json 확인
CPU 또는 Colab T4에서 model.safetensors 무결성 확인

아직 일반 Transformers에서 바로 안 되는 것:

AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")
이 repo만으로 one-line text generation 실행

이유는 model_type: "hrm_text"가 custom HRM-Text architecture이기 때문입니다. 공개 generation을 하려면 이 model repo에 HrmTextForCausalLM remote-code wrapper가 추가되어야 합니다.

모델 상세

항목	값
모델 ID	`LLM-OS-Models/KoHRM-Text-1.4B`
표준 이름	`KoHRM-Text-1.4B`
학습 출발점	scratch
아키텍처 계열	HRM-Text PrefixLM
아키텍처 크기	`XL`
파라미터	1,384,120,320
컨텍스트 길이	4,096 tokens
학습 dtype	bfloat16
공개 변환본 dtype	bfloat16 EMA `safetensors`
tokenizer	byte-level BPE, NFC normalization
vocabulary size	131,072
objective	PrefixLM response-only loss
optimizer	HRM-Text의 Adam-atan2
EMA	0.9999

변환된 config 주요 값:

{
  "model_type": "hrm_text",
  "architectures": ["HrmTextForCausalLM"],
  "vocab_size": 131072,
  "hidden_size": 1536,
  "num_hidden_layers": 32,
  "num_attention_heads": 12,
  "max_position_embeddings": 4096,
  "prefix_lm": true
}

토크나이저

토크나이저는 한국어, 영어, 코드, shell/terminal 텍스트, JSON/tool-call 형식을 고려해서 만들었습니다. 자주 쓰는 chat/tool special token은 가능한 한 안정적인 단일 token으로 유지합니다.

샘플 종류	chars/token
한국어 일반	2.60
한국어 법률	2.36
한국어 터미널 지시	2.18
shell command	2.68
tool-call JSON	3.32
Python code	3.37
영어	4.40

포맷 token:

<|im_start|>         instruction 시작
<|im_end|>           instruction 종료
<|box_end|>          response/end marker
<|object_ref_start|> direct condition
<|object_ref_end|>   chain-of-thought style condition
<|quad_start|>       noisy condition
<|quad_end|>         synthetic condition

프로젝트 내부 inference code가 쓰는 prompt 형식:

<|im_start|><|object_ref_start|>여기에_프롬프트를_넣습니다<|im_end|>

CPU / Colab T4 빠른 테스트

아래 코드는 CPU 환경이나 Colab T4 런타임에서 최신 공개 weight 파일을 확인하는 용도입니다. tokenizer, config, model.safetensors가 정상적으로 받아지고 읽히는지 검증합니다.

아직 public repo에 custom HRM-Text modeling wrapper가 없기 때문에 이 코드는 text generation을 실행하지 않습니다.

!pip -q install -U huggingface_hub transformers safetensors accelerate

from pathlib import Path
import json
import torch
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer
from safetensors.torch import load_file

repo_id = "LLM-OS-Models/KoHRM-Text-1.4B"

repo_dir = Path(snapshot_download(
    repo_id,
    revision="main",
    allow_patterns=[
        "README.md",
        "config.json",
        "tokenizer.json",
        "tokenizer_config.json",
        "special_tokens_map.json",
        "model.safetensors",
    ],
))

print("Downloaded to:", repo_dir)
print("Runtime:", "cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))

config = json.loads((repo_dir / "config.json").read_text())
print("model_type:", config["model_type"])
print("hidden_size:", config["hidden_size"])
print("vocab_size:", config["vocab_size"])
print("context:", config["max_position_embeddings"])

tokenizer = AutoTokenizer.from_pretrained(repo_dir, use_fast=True)
prompt = "<|im_start|><|object_ref_start|>한국어로 현재 디렉터리에서 가장 큰 파일 10개를 찾는 명령을 알려주세요.<|im_end|>"
ids = tokenizer(prompt, add_special_tokens=False)["input_ids"]
print("prompt tokens:", len(ids))
print("first token ids:", ids[:20])

# CPU weight integrity check. 약 2.8GB bf16 weight를 CPU RAM에 로드합니다.
state = load_file(str(repo_dir / "model.safetensors"), device="cpu")
num_tensors = len(state)
num_params = sum(t.numel() for t in state.values())
first_key = next(iter(state))

print("num_tensors:", num_tensors)
print("num_params:", f"{num_params:,}")
print("first tensor:", first_key, tuple(state[first_key].shape), state[first_key].dtype)

정상 결과:

model_type은 hrm_text입니다.
vocab_size는 131072입니다.
num_params는 약 1.38B입니다.
tokenizer는 CPU와 Colab T4에서 정상 로드됩니다.
AutoModelForCausalLM generation은 remote-code wrapper가 추가되기 전까지는 안 되는 것이 정상입니다.

다음 코드는 현재 public repo 기준으로 실패할 수 있습니다.

from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")

hrm_text architecture를 모른다는 오류가 나오면 현재 상태에서는 정상입니다.

내부 / 프로젝트 코드 기반 생성

내부 평가에서는 프로젝트 코드와 raw FSDP2 checkpoint를 사용합니다.

git clone https://github.com/LLM-OS-Models/KoHRM-text
cd KoHRM-text
pip install -r requirements.txt

그 다음 simple_inference_engine.py로 raw checkpoint를 로드합니다. 이 경로는 raw checkpoint 파일과 충분한 VRAM이 있는 CUDA 환경이 필요합니다. 아직 Colab T4용 권장 경로는 아닙니다.

학습 데이터

prepared data는 아래 dataset repo에 업로드합니다.

https://huggingface.co/datasets/LLM-OS-Models/KoHRM-Text-1.4B-prepared-data

학습 objective는 PrefixLM response-only loss입니다. instruction/prompt token은 context로 보고, loss는 response span에만 적용합니다.

주요 prepared data group:

데이터 그룹	Tokens	용도
`koterm_pretrain_mix_v1`	711.3M	stage-0/stage0b
HRM cleaned fast-cap stage1/stage1b	14.55B	HRM-style instruction pretraining
HRM cleaned full/no-cap stage2	14.55B	완료된 continuation
HRM cleaned full/no-cap extra stage2b	14.55B	예정된 continuation
local terminal conversations	9.39B	terminal/code/tool-heavy continuation
Korean tool/legal/wiki/finance mix	3.02B	한국어 domain/tool continuation
BCAI Finance Korean	857.7M	한국어 금융/domain data
Korean legal/admin task data	629.0M	한국어 법률/행정 data
Korean Wikipedia	462.5M	한국어 일반 텍스트
ToolBench train tool-call data	127.0M	tool-call pretraining
SWE-ZERO + GLM reasoning subsets	251.2M	code/reasoning data

평가 성격 데이터는 확인되는 범위에서 train에서 제외합니다. 예시는 ToolBench eval, Terminal Bench 계열 평가 데이터, benchmark 성격의 chi-bench입니다.

학습 진행

현재 run은 staged continuation 방식입니다.

stage0
-> stage0b
-> stage1
-> stage2
-> stage3
-> stage4
-> stage1b
-> stage2b
-> stage3b
-> stage4b

checkpoint는 model weights, optimizer state, EMA weights, recurrent carry state를 이어갑니다. resume_step_offset과 total_steps_override를 써서 stage마다 learning-rate schedule이 리셋되지 않고 긴 pretraining run처럼 이어지게 합니다.

2026-05-27 기준 stage1b가 진행 중이며, 이후 stage2b -> stage3b -> stage4b가 handoff watcher로 예약되어 있습니다. handoff는 각 stage의 실제 epoch_1_info.json global_step을 읽고 다음 stage를 시작합니다.

사용 목적

이 checkpoint는 다음 목적에 적합합니다.

continued pretraining 실험
한국어 tokenizer 및 HRM-Text architecture 실험
terminal/tool-call/code pretraining 연구
checkpoint conversion 및 evaluation 작업

아직 완성된 assistant model은 아닙니다.

제한 사항

중간 checkpoint이며 최종 aligned instruct model이 아닙니다.
전체 planned continuation이 아직 끝나지 않았습니다.
최종 SFT와 safety tuning이 아직 끝나지 않았습니다.
새 checkpoint의 public benchmark score는 아직 final이 아닙니다.
일반 Transformers generation은 custom hrm_text modeling wrapper 또는 remote-code file이 추가되어야 가능합니다.
tool-call JSON 유효성과 terminal action safety는 실제 사용 전에 별도 평가가 필요합니다.

인용

이 작업은 HRM-Text architecture와 training stack을 기반으로 합니다.

논문: https://arxiv.org/html/2605.20613
원본 코드: https://github.com/sapientinc/HRM-Text