Text Generation
Transformers
Safetensors
Korean
English
hrm_text
hrm-text
korean
terminal
tool-use
code
pretraining
prefix-lm
Instructions to use LLM-OS-Models/KoHRM-Text-1.4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLM-OS-Models/KoHRM-Text-1.4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLM-OS-Models/KoHRM-Text-1.4B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B") model = AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LLM-OS-Models/KoHRM-Text-1.4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLM-OS-Models/KoHRM-Text-1.4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/KoHRM-Text-1.4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LLM-OS-Models/KoHRM-Text-1.4B
- SGLang
How to use LLM-OS-Models/KoHRM-Text-1.4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/KoHRM-Text-1.4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/KoHRM-Text-1.4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/KoHRM-Text-1.4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/KoHRM-Text-1.4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LLM-OS-Models/KoHRM-Text-1.4B with Docker Model Runner:
docker model run hf.co/LLM-OS-Models/KoHRM-Text-1.4B
| license: other | |
| language: | |
| - ko | |
| - en | |
| tags: | |
| - hrm-text | |
| - korean | |
| - terminal | |
| - tool-use | |
| - code | |
| - pretraining | |
| - prefix-lm | |
| library_name: transformers | |
| # KoHRM-Text-1.4B | |
| **Language / ์ธ์ด:** [English](#english) | [ํ๊ตญ์ด](#korean) | |
| <a id="english"></a> | |
| ## English | |
| `KoHRM-Text-1.4B` is a scratch-pretrained Korean/English/code/terminal/tool-use model built from the `sapientinc/HRM-Text` PrefixLM training stack. | |
| This is **not** a continued finetune of `sapientinc/HRM-Text-1B`. It uses a new Korean/terminal-oriented 131K byte-level BPE tokenizer and a new scratch training run. | |
| ### Current Status | |
| This repository is a rolling **latest public model export**. Training is still in progress. | |
| - Main repo: `LLM-OS-Models/KoHRM-Text-1.4B` | |
| - Current public files: `model.safetensors`, `config.json`, tokenizer files, and this `README.md` | |
| - Raw FSDP2 resume checkpoints: `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` | |
| - Prepared data: `LLM-OS-Models/KoHRM-Text-1.4B-prepared-data` | |
| - Project code: https://github.com/LLM-OS-Models/KoHRM-text | |
| - Upstream HRM-Text code: https://github.com/sapientinc/HRM-Text | |
| - HRM-Text paper: https://arxiv.org/html/2605.20613 | |
| - Tokenizer repo: `LLM-OS-Models/HRM-Text-Ko-Terminal-Tokenizer-131K` | |
| The main branch is overwritten with the newest converted EMA `safetensors` export as training checkpoints are uploaded. To test the latest public weight, download `revision="main"`. | |
| ### Important Compatibility Note | |
| The public repo currently contains the converted model weights and tokenizer, but it does **not yet** include a Hugging Face `trust_remote_code` modeling implementation for `HrmTextForCausalLM`. | |
| What works today: | |
| - Download the latest public weights. | |
| - Load the tokenizer with `AutoTokenizer`. | |
| - Inspect `config.json`. | |
| - Verify `model.safetensors` on CPU or Colab T4. | |
| What is not supported yet in plain Transformers: | |
| - `AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")` | |
| - One-line hosted text generation from this repo | |
| Expected reason: `model_type: "hrm_text"` is a custom HRM-Text architecture. Public generation will require adding the compatible `HrmTextForCausalLM` remote-code files to this model repo or releasing a standard wrapper. | |
| ### Model Details | |
| | Field | Value | | |
| |---|---:| | |
| | Model id | `LLM-OS-Models/KoHRM-Text-1.4B` | | |
| | Standard name | `KoHRM-Text-1.4B` | | |
| | Training origin | scratch | | |
| | Architecture family | HRM-Text PrefixLM | | |
| | Architecture size | `XL` | | |
| | Parameters | 1,384,120,320 | | |
| | Context length | 4,096 tokens | | |
| | Training dtype | bfloat16 | | |
| | Public export dtype | bfloat16 EMA `safetensors` | | |
| | Tokenizer | byte-level BPE, NFC normalization | | |
| | Vocabulary size | 131,072 | | |
| | Objective | PrefixLM response-only loss | | |
| | Optimizer | Adam-atan2 from upstream HRM-Text | | |
| | EMA | 0.9999 | | |
| Converted config highlights: | |
| ```json | |
| { | |
| "model_type": "hrm_text", | |
| "architectures": ["HrmTextForCausalLM"], | |
| "vocab_size": 131072, | |
| "hidden_size": 1536, | |
| "num_hidden_layers": 32, | |
| "num_attention_heads": 12, | |
| "max_position_embeddings": 4096, | |
| "prefix_lm": true | |
| } | |
| ``` | |
| ### Compared With The HRM-Text Paper | |
| This run can take longer than the paper recipe even on 8 x H200 because the setup is not identical: | |
| - The paper reference used 16 x H100; this run uses 8 x H200. | |
| - KoHRM uses a larger 131K tokenizer vocabulary, compared with the upstream 65K tokenizer. | |
| - The public KoHRM size is about 1.38B parameters. | |
| - The stable long-run batch is `180,224` tokens/step after OOM probing; larger batches were possible briefly but not chosen for reliability. | |
| - The continuation includes extra Korean, terminal, tool-call, legal, finance, wiki, and repeated HRM-cleaned stages. | |
| This does not automatically guarantee better benchmark scores. The expected upside is domain-specific: Korean tokenization efficiency, Korean legal/finance/wiki coverage, terminal trajectories, tool-call formatting, and code-oriented behavior should have a better chance than the upstream English/general checkpoint. Final claims require evaluation after the planned continuation and SFT finish. | |
| ### Tokenizer | |
| The tokenizer was trained for Korean, English, code, shell/terminal text, and JSON/tool-call formats. It keeps common chat/tool special tokens as stable single tokens where possible. | |
| | Sample bucket | chars/token | | |
| |---|---:| | |
| | Korean general text | 2.60 | | |
| | Korean legal text | 2.36 | | |
| | Korean terminal instruction | 2.18 | | |
| | shell command | 2.68 | | |
| | tool-call JSON | 3.32 | | |
| | Python code | 3.37 | | |
| | English | 4.40 | | |
| Formatting tokens: | |
| ```text | |
| <|im_start|> instruction start | |
| <|im_end|> instruction end | |
| <|box_end|> response/end marker | |
| <|object_ref_start|> direct condition | |
| <|object_ref_end|> chain-of-thought style condition | |
| <|quad_start|> noisy condition | |
| <|quad_end|> synthetic condition | |
| ``` | |
| Prompt format used by the project-side inference code: | |
| ```text | |
| <|im_start|><|object_ref_start|>YOUR_PROMPT_HERE<|im_end|> | |
| ``` | |
| ### CPU / Colab T4 Quick Test | |
| Use this to test the **latest public weight files** on CPU or a Colab T4 runtime. This verifies that the tokenizer, config, and `model.safetensors` are downloadable and readable. | |
| It does not run text generation yet, because the public repo does not yet ship the custom HRM-Text modeling wrapper. | |
| ```python | |
| !pip -q install -U huggingface_hub transformers safetensors accelerate | |
| ``` | |
| ```python | |
| from pathlib import Path | |
| import json | |
| import torch | |
| from huggingface_hub import snapshot_download | |
| from transformers import AutoTokenizer | |
| from safetensors.torch import load_file | |
| repo_id = "LLM-OS-Models/KoHRM-Text-1.4B" | |
| repo_dir = Path(snapshot_download( | |
| repo_id, | |
| revision="main", | |
| allow_patterns=[ | |
| "README.md", | |
| "config.json", | |
| "tokenizer.json", | |
| "tokenizer_config.json", | |
| "special_tokens_map.json", | |
| "model.safetensors", | |
| ], | |
| )) | |
| print("Downloaded to:", repo_dir) | |
| print("Runtime:", "cuda" if torch.cuda.is_available() else "cpu") | |
| if torch.cuda.is_available(): | |
| print("GPU:", torch.cuda.get_device_name(0)) | |
| config = json.loads((repo_dir / "config.json").read_text()) | |
| print("model_type:", config["model_type"]) | |
| print("hidden_size:", config["hidden_size"]) | |
| print("vocab_size:", config["vocab_size"]) | |
| print("context:", config["max_position_embeddings"]) | |
| tokenizer = AutoTokenizer.from_pretrained(repo_dir, use_fast=True) | |
| prompt = "<|im_start|><|object_ref_start|>ํ๊ตญ์ด๋ก ํ์ฌ ๋๋ ํฐ๋ฆฌ์์ ๊ฐ์ฅ ํฐ ํ์ผ 10๊ฐ๋ฅผ ์ฐพ๋ ๋ช ๋ น์ ์๋ ค์ฃผ์ธ์.<|im_end|>" | |
| ids = tokenizer(prompt, add_special_tokens=False)["input_ids"] | |
| print("prompt tokens:", len(ids)) | |
| print("first token ids:", ids[:20]) | |
| # CPU weight integrity check. This loads about 2.8GB of bf16 weights into CPU RAM. | |
| state = load_file(str(repo_dir / "model.safetensors"), device="cpu") | |
| num_tensors = len(state) | |
| num_params = sum(t.numel() for t in state.values()) | |
| first_key = next(iter(state)) | |
| print("num_tensors:", num_tensors) | |
| print("num_params:", f"{num_params:,}") | |
| print("first tensor:", first_key, tuple(state[first_key].shape), state[first_key].dtype) | |
| ``` | |
| Expected result: | |
| - `model_type` should be `hrm_text`. | |
| - `vocab_size` should be `131072`. | |
| - `num_params` should be around `1.38B`. | |
| - Tokenizer loading should work on CPU and Colab T4. | |
| - `AutoModelForCausalLM` generation is expected to be unavailable until remote-code support is added. | |
| If you try this: | |
| ```python | |
| from transformers import AutoModelForCausalLM | |
| AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B") | |
| ``` | |
| and it fails with an unknown `hrm_text` architecture, that is expected for the current public export. | |
| ### Internal / Project-Side Generation | |
| For actual generation today, use the project code and raw FSDP2 checkpoints. This is the currently supported copy-paste path for CUDA machines. A BF16-capable GPU with enough VRAM is recommended; Colab T4 is useful for the smoke test above, not for this raw-checkpoint generation path. | |
| ```bash | |
| git clone https://github.com/LLM-OS-Models/KoHRM-text | |
| cd KoHRM-text | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| pip install -U pip wheel | |
| pip install -r requirements.txt | |
| pip install -U "huggingface_hub[cli]" | |
| export TOKENIZERS_PARALLELISM=false | |
| export NUMEXPR_MAX_THREADS=128 | |
| ``` | |
| Download the latest uploaded raw checkpoint example. This example uses `stage1b-hrm-fastcap-repeat-step310000`, which is available in the raw checkpoint repo. When a newer raw checkpoint is uploaded, change both the include path and `ckpt_step`. | |
| ```bash | |
| mkdir -p checkpoints/kohm-raw | |
| huggingface-cli download LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints \ | |
| --include "stage1b-hrm-fastcap-repeat-step310000/**" \ | |
| --local-dir checkpoints/kohm-raw | |
| ``` | |
| Create and run a minimal generation script: | |
| ```bash | |
| cat > run_kohrm_raw_generate.py <<'PY' | |
| import os | |
| os.environ.setdefault("TOKENIZERS_PARALLELISM", "false") | |
| os.environ.setdefault("NUMEXPR_MAX_THREADS", "128") | |
| from simple_inference_engine import inference_load_checkpoint, inference_generate | |
| ckpt_dir = "checkpoints/kohm-raw/stage1b-hrm-fastcap-repeat-step310000" | |
| prompts = [ | |
| ( | |
| 0, | |
| ( | |
| "direct", | |
| "ํ๊ตญ์ด ์กด๋๋ง๋ก ํ์ฌ ๋๋ ํฐ๋ฆฌ์์ ์ฉ๋์ด ๊ฐ์ฅ ํฐ ํ์ผ 10๊ฐ๋ฅผ ์ฐพ๋ bash ๋ช ๋ น์ ์ ์ํด ์ฃผ์ธ์.", | |
| ), | |
| ), | |
| ( | |
| 1, | |
| ( | |
| "direct", | |
| "Write a Python function that validates a JSON tool-call object with name and arguments.", | |
| ), | |
| ), | |
| ] | |
| ckpt = inference_load_checkpoint( | |
| ckpt_path=ckpt_dir, | |
| ckpt_epoch=None, | |
| ckpt_step=310000, | |
| ckpt_use_ema=True, | |
| device="cuda", | |
| ) | |
| for pid, text in inference_generate( | |
| ckpt, | |
| iter(prompts), | |
| max_tokens=1024, | |
| max_generation=256, | |
| batch_size=1, | |
| temp=0.0, | |
| ): | |
| print(f"\n### sample {pid}\n{text}") | |
| PY | |
| python run_kohrm_raw_generate.py | |
| ``` | |
| Prompt format is handled by `InferenceCheckpoint.tokenize_prompt`. The first tuple item is the condition string, usually `"direct"`, and the second item is the user prompt. Internally this becomes: | |
| ```text | |
| <|im_start|><|object_ref_start|>PROMPT<|im_end|> | |
| ``` | |
| If you want to test a newer raw checkpoint: | |
| 1. Check the raw checkpoint repo for the newest uploaded stage/step. | |
| 2. Change the `huggingface-cli download --include` pattern. | |
| 3. Change `ckpt_dir`. | |
| 4. Change `ckpt_step`. | |
| Plain `AutoModelForCausalLM` generation from `model.safetensors` will be added later when the public `trust_remote_code` wrapper is available. | |
| ### Training Data | |
| Prepared data artifacts are uploaded to: | |
| https://huggingface.co/datasets/LLM-OS-Models/KoHRM-Text-1.4B-prepared-data | |
| The training objective is PrefixLM response-only loss. Instruction/prompt tokens are visible as context, while loss is applied to the response span. | |
| Major prepared data groups: | |
| | Dataset group | Tokens | Use | | |
| |---|---:|---| | |
| | `koterm_pretrain_mix_v1` | 711.3M | stage-0/stage0b | | |
| | HRM cleaned fast-cap stage1/stage1b | 14.55B | HRM-style instruction pretraining | | |
| | HRM cleaned full/no-cap stage2 | 14.55B | completed continuation | | |
| | HRM cleaned full/no-cap extra stage2b | 14.55B | active continuation | | |
| | Local terminal conversations | 9.39B | terminal/code/tool-heavy continuation | | |
| | Korean tool/legal/wiki/finance mix | 3.02B | Korean domain and tool continuation | | |
| | BCAI Finance Korean | 857.7M | Korean finance/domain data | | |
| | Korean legal/admin task data | 629.0M | Korean legal/admin data | | |
| | Korean Wikipedia | 462.5M | Korean general text | | |
| | ToolBench train tool-call data | 127.0M | tool-call pretraining | | |
| | SWE-ZERO + GLM reasoning subsets | 251.2M | code/reasoning data | | |
| Evaluation-like datasets are excluded where identified, including ToolBench eval, Terminal Bench style evaluation data, and benchmark-oriented `chi-bench` data. | |
| ### Training Run | |
| The current run uses staged continuation: | |
| ```text | |
| stage0 | |
| -> stage0b | |
| -> stage1 | |
| -> stage2 | |
| -> stage3 | |
| -> stage4 | |
| -> stage1b | |
| -> stage2b | |
| -> stage3b | |
| -> stage4b | |
| -> stage1c | |
| -> stage2c | |
| -> stage3c | |
| -> stage4c | |
| ``` | |
| The checkpoint carries model weights, optimizer state, EMA weights, and recurrent carry state. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer run instead of resetting at each stage. | |
| As of 2026-05-27, `stage2b` is active. The continuation watcher is scheduled to launch `stage3b -> stage4b -> stage1c -> stage2c -> stage3c -> stage4c` after each completed checkpoint. The handoff reads the actual `epoch_1_info.json` `global_step` from each completed checkpoint before starting the next stage. | |
| ### Intended Use | |
| This checkpoint is intended for: | |
| - continued pretraining experiments | |
| - Korean tokenizer and HRM-Text architecture experiments | |
| - terminal/tool-call/code pretraining research | |
| - checkpoint conversion and evaluation work | |
| It is not yet intended as a finished assistant model. | |
| ### Limitations | |
| - This is an intermediate checkpoint, not a final aligned instruct model. | |
| - The full planned continuation has not finished. | |
| - Final SFT and safety tuning have not been completed. | |
| - Public benchmark scores for this new checkpoint are not final. | |
| - Plain Transformers generation requires adding the custom `hrm_text` modeling wrapper or remote-code files. | |
| - Tool-call JSON validity and terminal action safety must be evaluated before production use. | |
| ### Citation | |
| This work builds on HRM-Text: | |
| - Paper: https://arxiv.org/html/2605.20613 | |
| - Upstream code: https://github.com/sapientinc/HRM-Text | |
| <a id="korean"></a> | |
| ## ํ๊ตญ์ด | |
| `KoHRM-Text-1.4B`๋ `sapientinc/HRM-Text`์ PrefixLM ํ์ต ์คํ์ ๊ธฐ๋ฐ์ผ๋ก ์ฒ์๋ถํฐ ํ์ต ์ค์ธ ํ๊ตญ์ด/์์ด/์ฝ๋/ํฐ๋ฏธ๋/ํด์ฝ ๋ชจ๋ธ์ ๋๋ค. | |
| ์ด ๋ชจ๋ธ์ `sapientinc/HRM-Text-1B`๋ฅผ ์ด์ด์ ํ์ธํ๋ํ ๋ชจ๋ธ์ด ์๋๋๋ค. ํ๊ตญ์ด์ ํฐ๋ฏธ๋/ํด์ฝ ํ์์ ๋ง์ถฐ ์๋ก ๋ง๋ 131K byte-level BPE tokenizer๋ฅผ ์ฌ์ฉํ๋ฉฐ, ๊ฐ์ค์น๋ scratch pretraining์ผ๋ก ํ์ตํฉ๋๋ค. | |
| ### ํ์ฌ ์ํ | |
| ์ด ์ ์ฅ์๋ ์ต์ ๊ณต๊ฐ ๋ณํ๋ณธ์ ๊ณ์ ๋ฎ์ด์ฐ๋ rolling latest model repo์ ๋๋ค. ํ์ต์ ์์ง ์งํ ์ค์ ๋๋ค. | |
| - ๋ฉ์ธ ๋ชจ๋ธ repo: `LLM-OS-Models/KoHRM-Text-1.4B` | |
| - ํ์ฌ ๊ณต๊ฐ ํ์ผ: `model.safetensors`, `config.json`, tokenizer ํ์ผ, `README.md` | |
| - raw FSDP2 resume checkpoint: `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` | |
| - prepared data: `LLM-OS-Models/KoHRM-Text-1.4B-prepared-data` | |
| - ํ๋ก์ ํธ ์ฝ๋: https://github.com/LLM-OS-Models/KoHRM-text | |
| - ์๋ณธ HRM-Text ์ฝ๋: https://github.com/sapientinc/HRM-Text | |
| - HRM-Text ๋ ผ๋ฌธ: https://arxiv.org/html/2605.20613 | |
| - tokenizer repo: `LLM-OS-Models/HRM-Text-Ko-Terminal-Tokenizer-131K` | |
| ์ต์ ๊ณต๊ฐ weight๋ฅผ ํ ์คํธํ๋ ค๋ฉด `revision="main"`์ผ๋ก ๋ค์ด๋ก๋ํ๋ฉด ๋ฉ๋๋ค. ํ์ต ์ค 10,000 step ๋จ์๋ก ์ checkpoint๊ฐ ๋ณํ๋์ด ์ฌ๋ผ์ค๋ฉด ๊ฐ์ ํ์ผ๋ช ์ด ์ต์ EMA `safetensors`๋ก ๊ฐฑ์ ๋ฉ๋๋ค. | |
| ### ์ค์ํ ํธํ์ฑ ์๋ด | |
| ํ์ฌ ๊ณต๊ฐ repo์๋ ๋ณํ๋ model weight์ tokenizer๊ฐ ์์ง๋ง, ์์ง Hugging Face `trust_remote_code`์ฉ `HrmTextForCausalLM` ๊ตฌํ ํ์ผ์ ํฌํจ๋์ด ์์ง ์์ต๋๋ค. | |
| ํ์ฌ ๋ฐ๋ก ๊ฐ๋ฅํ ๊ฒ: | |
| - ์ต์ ๊ณต๊ฐ weight ๋ค์ด๋ก๋ | |
| - `AutoTokenizer`๋ก tokenizer ๋ก๋ | |
| - `config.json` ํ์ธ | |
| - CPU ๋๋ Colab T4์์ `model.safetensors` ๋ฌด๊ฒฐ์ฑ ํ์ธ | |
| ์์ง ์ผ๋ฐ Transformers์์ ๋ฐ๋ก ์ ๋๋ ๊ฒ: | |
| - `AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")` | |
| - ์ด repo๋ง์ผ๋ก one-line text generation ์คํ | |
| ์ด์ ๋ `model_type: "hrm_text"`๊ฐ custom HRM-Text architecture์ด๊ธฐ ๋๋ฌธ์ ๋๋ค. ๊ณต๊ฐ generation์ ํ๋ ค๋ฉด ์ด model repo์ `HrmTextForCausalLM` remote-code wrapper๊ฐ ์ถ๊ฐ๋์ด์ผ ํฉ๋๋ค. | |
| ### ๋ชจ๋ธ ์์ธ | |
| | ํญ๋ชฉ | ๊ฐ | | |
| |---|---:| | |
| | ๋ชจ๋ธ ID | `LLM-OS-Models/KoHRM-Text-1.4B` | | |
| | ํ์ค ์ด๋ฆ | `KoHRM-Text-1.4B` | | |
| | ํ์ต ์ถ๋ฐ์ | scratch | | |
| | ์ํคํ ์ฒ ๊ณ์ด | HRM-Text PrefixLM | | |
| | ์ํคํ ์ฒ ํฌ๊ธฐ | `XL` | | |
| | ํ๋ผ๋ฏธํฐ | 1,384,120,320 | | |
| | ์ปจํ ์คํธ ๊ธธ์ด | 4,096 tokens | | |
| | ํ์ต dtype | bfloat16 | | |
| | ๊ณต๊ฐ ๋ณํ๋ณธ dtype | bfloat16 EMA `safetensors` | | |
| | tokenizer | byte-level BPE, NFC normalization | | |
| | vocabulary size | 131,072 | | |
| | objective | PrefixLM response-only loss | | |
| | optimizer | HRM-Text์ Adam-atan2 | | |
| | EMA | 0.9999 | | |
| ๋ณํ๋ config ์ฃผ์ ๊ฐ: | |
| ```json | |
| { | |
| "model_type": "hrm_text", | |
| "architectures": ["HrmTextForCausalLM"], | |
| "vocab_size": 131072, | |
| "hidden_size": 1536, | |
| "num_hidden_layers": 32, | |
| "num_attention_heads": 12, | |
| "max_position_embeddings": 4096, | |
| "prefix_lm": true | |
| } | |
| ``` | |
| ### HRM-Text ๋ ผ๋ฌธ ๋๋น | |
| ํ์ฌ run์ ๋ ผ๋ฌธ recipe๋ณด๋ค ๋ ์ค๋ ๊ฑธ๋ฆด ์ ์์ต๋๋ค. ์ค์ ์ด ์์ ํ ๊ฐ์ง ์๊ธฐ ๋๋ฌธ์ ๋๋ค. | |
| - ๋ ผ๋ฌธ ๊ธฐ์ค์ 16 x H100์ด๊ณ , ํ์ฌ run์ 8 x H200์ ๋๋ค. | |
| - KoHRM์ ์๋ณธ 65K tokenizer๋ณด๋ค ํฐ 131K tokenizer vocab์ ์๋๋ค. | |
| - ๊ณต๊ฐ KoHRM ํฌ๊ธฐ๋ ์ฝ 1.38B parameters์ ๋๋ค. | |
| - ์์ ์ฅ๊ธฐ run batch๋ OOM probe ์ดํ `180,224` tokens/step์ผ๋ก ์ก์์ต๋๋ค. ๋ ํฐ batch๋ ์ด๋ฐ์ ๊ฐ๋ฅํด ๋ณด์ฌ๋ ์ฅ๊ธฐ ์์ ์ฑ์ด ๋จ์ด์ก์ต๋๋ค. | |
| - ํ๊ตญ์ด, ํฐ๋ฏธ๋, ํด์ฝ, ๋ฒ๋ฅ , ๊ธ์ต, ์ํค, HRM-cleaned ๋ฐ๋ณต stage๊ฐ ์ถ๊ฐ๋์ต๋๋ค. | |
| ์ด๊ฒ์ด ์๋์ผ๋ก ๋ชจ๋ benchmark ์ ์ ์์น์ ๋ณด์ฅํ์ง๋ ์์ต๋๋ค. ๋ค๋ง ํ๊ตญ์ด ํ ํฌ๋์ด์ ํจ์จ, ํ๊ตญ์ด ๋ฒ๋ฅ /๊ธ์ต/์ํค coverage, ํฐ๋ฏธ๋ trajectory, tool-call formatting, code-oriented behavior ์ชฝ์ ์๋ณธ ์์ด/general checkpoint๋ณด๋ค ์ข์์ง ๊ฐ๋ฅ์ฑ์ด ์์ต๋๋ค. ์ต์ข ์ฃผ์ฅ์ continuation๊ณผ SFT๊ฐ ๋๋ ๋ค ํ๊ฐ๋ก ํ์ธํด์ผ ํฉ๋๋ค. | |
| ### ํ ํฌ๋์ด์ | |
| ํ ํฌ๋์ด์ ๋ ํ๊ตญ์ด, ์์ด, ์ฝ๋, shell/terminal ํ ์คํธ, JSON/tool-call ํ์์ ๊ณ ๋ คํด์ ๋ง๋ค์์ต๋๋ค. ์์ฃผ ์ฐ๋ chat/tool special token์ ๊ฐ๋ฅํ ํ ์์ ์ ์ธ ๋จ์ผ token์ผ๋ก ์ ์งํฉ๋๋ค. | |
| | ์ํ ์ข ๋ฅ | chars/token | | |
| |---|---:| | |
| | ํ๊ตญ์ด ์ผ๋ฐ | 2.60 | | |
| | ํ๊ตญ์ด ๋ฒ๋ฅ | 2.36 | | |
| | ํ๊ตญ์ด ํฐ๋ฏธ๋ ์ง์ | 2.18 | | |
| | shell command | 2.68 | | |
| | tool-call JSON | 3.32 | | |
| | Python code | 3.37 | | |
| | ์์ด | 4.40 | | |
| ํฌ๋งท token: | |
| ```text | |
| <|im_start|> instruction ์์ | |
| <|im_end|> instruction ์ข ๋ฃ | |
| <|box_end|> response/end marker | |
| <|object_ref_start|> direct condition | |
| <|object_ref_end|> chain-of-thought style condition | |
| <|quad_start|> noisy condition | |
| <|quad_end|> synthetic condition | |
| ``` | |
| ํ๋ก์ ํธ ๋ด๋ถ inference code๊ฐ ์ฐ๋ prompt ํ์: | |
| ```text | |
| <|im_start|><|object_ref_start|>์ฌ๊ธฐ์_ํ๋กฌํํธ๋ฅผ_๋ฃ์ต๋๋ค<|im_end|> | |
| ``` | |
| ### CPU / Colab T4 ๋น ๋ฅธ ํ ์คํธ | |
| ์๋ ์ฝ๋๋ CPU ํ๊ฒฝ์ด๋ Colab T4 ๋ฐํ์์์ ์ต์ ๊ณต๊ฐ weight ํ์ผ์ ํ์ธํ๋ ์ฉ๋์ ๋๋ค. tokenizer, config, `model.safetensors`๊ฐ ์ ์์ ์ผ๋ก ๋ฐ์์ง๊ณ ์ฝํ๋์ง ๊ฒ์ฆํฉ๋๋ค. | |
| ์์ง public repo์ custom HRM-Text modeling wrapper๊ฐ ์๊ธฐ ๋๋ฌธ์ ์ด ์ฝ๋๋ text generation์ ์คํํ์ง ์์ต๋๋ค. | |
| ```python | |
| !pip -q install -U huggingface_hub transformers safetensors accelerate | |
| ``` | |
| ```python | |
| from pathlib import Path | |
| import json | |
| import torch | |
| from huggingface_hub import snapshot_download | |
| from transformers import AutoTokenizer | |
| from safetensors.torch import load_file | |
| repo_id = "LLM-OS-Models/KoHRM-Text-1.4B" | |
| repo_dir = Path(snapshot_download( | |
| repo_id, | |
| revision="main", | |
| allow_patterns=[ | |
| "README.md", | |
| "config.json", | |
| "tokenizer.json", | |
| "tokenizer_config.json", | |
| "special_tokens_map.json", | |
| "model.safetensors", | |
| ], | |
| )) | |
| print("Downloaded to:", repo_dir) | |
| print("Runtime:", "cuda" if torch.cuda.is_available() else "cpu") | |
| if torch.cuda.is_available(): | |
| print("GPU:", torch.cuda.get_device_name(0)) | |
| config = json.loads((repo_dir / "config.json").read_text()) | |
| print("model_type:", config["model_type"]) | |
| print("hidden_size:", config["hidden_size"]) | |
| print("vocab_size:", config["vocab_size"]) | |
| print("context:", config["max_position_embeddings"]) | |
| tokenizer = AutoTokenizer.from_pretrained(repo_dir, use_fast=True) | |
| prompt = "<|im_start|><|object_ref_start|>ํ๊ตญ์ด๋ก ํ์ฌ ๋๋ ํฐ๋ฆฌ์์ ๊ฐ์ฅ ํฐ ํ์ผ 10๊ฐ๋ฅผ ์ฐพ๋ ๋ช ๋ น์ ์๋ ค์ฃผ์ธ์.<|im_end|>" | |
| ids = tokenizer(prompt, add_special_tokens=False)["input_ids"] | |
| print("prompt tokens:", len(ids)) | |
| print("first token ids:", ids[:20]) | |
| # CPU weight integrity check. ์ฝ 2.8GB bf16 weight๋ฅผ CPU RAM์ ๋ก๋ํฉ๋๋ค. | |
| state = load_file(str(repo_dir / "model.safetensors"), device="cpu") | |
| num_tensors = len(state) | |
| num_params = sum(t.numel() for t in state.values()) | |
| first_key = next(iter(state)) | |
| print("num_tensors:", num_tensors) | |
| print("num_params:", f"{num_params:,}") | |
| print("first tensor:", first_key, tuple(state[first_key].shape), state[first_key].dtype) | |
| ``` | |
| ์ ์ ๊ฒฐ๊ณผ: | |
| - `model_type`์ `hrm_text`์ ๋๋ค. | |
| - `vocab_size`๋ `131072`์ ๋๋ค. | |
| - `num_params`๋ ์ฝ `1.38B`์ ๋๋ค. | |
| - tokenizer๋ CPU์ Colab T4์์ ์ ์ ๋ก๋๋ฉ๋๋ค. | |
| - `AutoModelForCausalLM` generation์ remote-code wrapper๊ฐ ์ถ๊ฐ๋๊ธฐ ์ ๊น์ง๋ ์ ๋๋ ๊ฒ์ด ์ ์์ ๋๋ค. | |
| ๋ค์ ์ฝ๋๋ ํ์ฌ public repo ๊ธฐ์ค์ผ๋ก ์คํจํ ์ ์์ต๋๋ค. | |
| ```python | |
| from transformers import AutoModelForCausalLM | |
| AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B") | |
| ``` | |
| `hrm_text` architecture๋ฅผ ๋ชจ๋ฅธ๋ค๋ ์ค๋ฅ๊ฐ ๋์ค๋ฉด ํ์ฌ ์ํ์์๋ ์ ์์ ๋๋ค. | |
| ### ๋ด๋ถ / ํ๋ก์ ํธ ์ฝ๋ ๊ธฐ๋ฐ ์์ฑ | |
| ํ์ฌ ์ค์ generation์ ํ๋ ค๋ฉด ํ๋ก์ ํธ ์ฝ๋์ raw FSDP2 checkpoint๋ฅผ ์ฌ์ฉํฉ๋๋ค. ์ด๊ฒ์ด ์ง๊ธ ๋ฐ๋ก ์ธ ์ ์๋ CUDA ํ๊ฒฝ์ฉ ๊ฒฝ๋ก์ ๋๋ค. BF16์ด ๋๋ ์ถฉ๋ถํ VRAM์ GPU๋ฅผ ๊ถ์ฅํฉ๋๋ค. Colab T4๋ ์ smoke test์๋ ์ธ ์ ์์ง๋ง, raw checkpoint generation ๊ถ์ฅ ๊ฒฝ๋ก๋ ์๋๋๋ค. | |
| ```bash | |
| git clone https://github.com/LLM-OS-Models/KoHRM-text | |
| cd KoHRM-text | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| pip install -U pip wheel | |
| pip install -r requirements.txt | |
| pip install -U "huggingface_hub[cli]" | |
| export TOKENIZERS_PARALLELISM=false | |
| export NUMEXPR_MAX_THREADS=128 | |
| ``` | |
| ํ์ฌ ๋ฐ๋ก ๋ฐ์ ์ ์๋ raw checkpoint ์์์ ๋๋ค. ์๋ ์์๋ raw checkpoint repo์ ์ฌ๋ผ์จ `stage1b-hrm-fastcap-repeat-step310000`์ ์ฌ์ฉํฉ๋๋ค. ๋ ์ต์ raw checkpoint๊ฐ ์ฌ๋ผ์ค๋ฉด include path์ `ckpt_step`์ ๊ฐ์ด ๋ฐ๊พธ๋ฉด ๋ฉ๋๋ค. | |
| ```bash | |
| mkdir -p checkpoints/kohm-raw | |
| huggingface-cli download LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints \ | |
| --include "stage1b-hrm-fastcap-repeat-step310000/**" \ | |
| --local-dir checkpoints/kohm-raw | |
| ``` | |
| ์ต์ generation script: | |
| ```bash | |
| cat > run_kohrm_raw_generate.py <<'PY' | |
| import os | |
| os.environ.setdefault("TOKENIZERS_PARALLELISM", "false") | |
| os.environ.setdefault("NUMEXPR_MAX_THREADS", "128") | |
| from simple_inference_engine import inference_load_checkpoint, inference_generate | |
| ckpt_dir = "checkpoints/kohm-raw/stage1b-hrm-fastcap-repeat-step310000" | |
| prompts = [ | |
| ( | |
| 0, | |
| ( | |
| "direct", | |
| "ํ๊ตญ์ด ์กด๋๋ง๋ก ํ์ฌ ๋๋ ํฐ๋ฆฌ์์ ์ฉ๋์ด ๊ฐ์ฅ ํฐ ํ์ผ 10๊ฐ๋ฅผ ์ฐพ๋ bash ๋ช ๋ น์ ์ ์ํด ์ฃผ์ธ์.", | |
| ), | |
| ), | |
| ( | |
| 1, | |
| ( | |
| "direct", | |
| "Write a Python function that validates a JSON tool-call object with name and arguments.", | |
| ), | |
| ), | |
| ] | |
| ckpt = inference_load_checkpoint( | |
| ckpt_path=ckpt_dir, | |
| ckpt_epoch=None, | |
| ckpt_step=310000, | |
| ckpt_use_ema=True, | |
| device="cuda", | |
| ) | |
| for pid, text in inference_generate( | |
| ckpt, | |
| iter(prompts), | |
| max_tokens=1024, | |
| max_generation=256, | |
| batch_size=1, | |
| temp=0.0, | |
| ): | |
| print(f"\n### sample {pid}\n{text}") | |
| PY | |
| python run_kohrm_raw_generate.py | |
| ``` | |
| Prompt formatting์ `InferenceCheckpoint.tokenize_prompt`๊ฐ ์ฒ๋ฆฌํฉ๋๋ค. tuple์ ์ฒซ ๋ฒ์งธ ๊ฐ์ condition string์ด๊ณ ๋ณดํต `"direct"`๋ฅผ ์๋๋ค. ๋ ๋ฒ์งธ ๊ฐ์ ์ฌ์ฉ์ prompt์ ๋๋ค. ๋ด๋ถ์ ์ผ๋ก๋ ๋ค์ ํ์์ด ๋ฉ๋๋ค. | |
| ```text | |
| <|im_start|><|object_ref_start|>PROMPT<|im_end|> | |
| ``` | |
| ๋ ์ต์ raw checkpoint๋ฅผ ํ ์คํธํ๋ ค๋ฉด: | |
| 1. raw checkpoint repo์์ ๊ฐ์ฅ ์ต์ stage/step์ ํ์ธํฉ๋๋ค. | |
| 2. `huggingface-cli download --include` pattern์ ๋ฐ๊ฟ๋๋ค. | |
| 3. `ckpt_dir`๋ฅผ ๋ฐ๊ฟ๋๋ค. | |
| 4. `ckpt_step`์ ๋ฐ๊ฟ๋๋ค. | |
| ๊ณต๊ฐ `model.safetensors`์์ ๋ฐ๋ก `AutoModelForCausalLM` generation์ ํ๋ ๊ฒฝ๋ก๋ public `trust_remote_code` wrapper๋ฅผ ์ถ๊ฐํ ๋ค ์ง์ํ ์์ ์ ๋๋ค. | |
| ### ํ์ต ๋ฐ์ดํฐ | |
| prepared data๋ ์๋ dataset repo์ ์ ๋ก๋ํฉ๋๋ค. | |
| https://huggingface.co/datasets/LLM-OS-Models/KoHRM-Text-1.4B-prepared-data | |
| ํ์ต objective๋ PrefixLM response-only loss์ ๋๋ค. instruction/prompt token์ context๋ก ๋ณด๊ณ , loss๋ response span์๋ง ์ ์ฉํฉ๋๋ค. | |
| ์ฃผ์ prepared data group: | |
| | ๋ฐ์ดํฐ ๊ทธ๋ฃน | Tokens | ์ฉ๋ | | |
| |---|---:|---| | |
| | `koterm_pretrain_mix_v1` | 711.3M | stage-0/stage0b | | |
| | HRM cleaned fast-cap stage1/stage1b | 14.55B | HRM-style instruction pretraining | | |
| | HRM cleaned full/no-cap stage2 | 14.55B | ์๋ฃ๋ continuation | | |
| | HRM cleaned full/no-cap extra stage2b | 14.55B | ์งํ ์ค์ธ continuation | | |
| | local terminal conversations | 9.39B | terminal/code/tool-heavy continuation | | |
| | Korean tool/legal/wiki/finance mix | 3.02B | ํ๊ตญ์ด domain/tool continuation | | |
| | BCAI Finance Korean | 857.7M | ํ๊ตญ์ด ๊ธ์ต/domain data | | |
| | Korean legal/admin task data | 629.0M | ํ๊ตญ์ด ๋ฒ๋ฅ /ํ์ data | | |
| | Korean Wikipedia | 462.5M | ํ๊ตญ์ด ์ผ๋ฐ ํ ์คํธ | | |
| | ToolBench train tool-call data | 127.0M | tool-call pretraining | | |
| | SWE-ZERO + GLM reasoning subsets | 251.2M | code/reasoning data | | |
| ํ๊ฐ ์ฑ๊ฒฉ ๋ฐ์ดํฐ๋ ํ์ธ๋๋ ๋ฒ์์์ train์์ ์ ์ธํฉ๋๋ค. ์์๋ ToolBench eval, Terminal Bench ๊ณ์ด ํ๊ฐ ๋ฐ์ดํฐ, benchmark ์ฑ๊ฒฉ์ `chi-bench`์ ๋๋ค. | |
| ### ํ์ต ์งํ | |
| ํ์ฌ run์ staged continuation ๋ฐฉ์์ ๋๋ค. | |
| ```text | |
| stage0 | |
| -> stage0b | |
| -> stage1 | |
| -> stage2 | |
| -> stage3 | |
| -> stage4 | |
| -> stage1b | |
| -> stage2b | |
| -> stage3b | |
| -> stage4b | |
| -> stage1c | |
| -> stage2c | |
| -> stage3c | |
| -> stage4c | |
| ``` | |
| checkpoint๋ model weights, optimizer state, EMA weights, recurrent carry state๋ฅผ ์ด์ด๊ฐ๋๋ค. `resume_step_offset`๊ณผ `total_steps_override`๋ฅผ ์จ์ stage๋ง๋ค learning-rate schedule์ด ๋ฆฌ์ ๋์ง ์๊ณ ๊ธด pretraining run์ฒ๋ผ ์ด์ด์ง๊ฒ ํฉ๋๋ค. | |
| 2026-05-27 ๊ธฐ์ค `stage2b`๊ฐ ์งํ ์ค์ ๋๋ค. continuation watcher๊ฐ ์ดํ `stage3b -> stage4b -> stage1c -> stage2c -> stage3c -> stage4c`๋ฅผ ์ด์ด์ ์คํํ๋๋ก ์์ฝ๋์ด ์์ต๋๋ค. handoff๋ ๊ฐ stage์ ์ค์ `epoch_1_info.json` `global_step`์ ์ฝ๊ณ ๋ค์ stage๋ฅผ ์์ํฉ๋๋ค. | |
| ### ์ฌ์ฉ ๋ชฉ์ | |
| ์ด checkpoint๋ ๋ค์ ๋ชฉ์ ์ ์ ํฉํฉ๋๋ค. | |
| - continued pretraining ์คํ | |
| - ํ๊ตญ์ด tokenizer ๋ฐ HRM-Text architecture ์คํ | |
| - terminal/tool-call/code pretraining ์ฐ๊ตฌ | |
| - checkpoint conversion ๋ฐ evaluation ์์ | |
| ์์ง ์์ฑ๋ assistant model์ ์๋๋๋ค. | |
| ### ์ ํ ์ฌํญ | |
| - ์ค๊ฐ checkpoint์ด๋ฉฐ ์ต์ข aligned instruct model์ด ์๋๋๋ค. | |
| - ์ ์ฒด planned continuation์ด ์์ง ๋๋์ง ์์์ต๋๋ค. | |
| - ์ต์ข SFT์ safety tuning์ด ์์ง ๋๋์ง ์์์ต๋๋ค. | |
| - ์ checkpoint์ public benchmark score๋ ์์ง final์ด ์๋๋๋ค. | |
| - ์ผ๋ฐ Transformers generation์ custom `hrm_text` modeling wrapper ๋๋ remote-code file์ด ์ถ๊ฐ๋์ด์ผ ๊ฐ๋ฅํฉ๋๋ค. | |
| - tool-call JSON ์ ํจ์ฑ๊ณผ terminal action safety๋ ์ค์ ์ฌ์ฉ ์ ์ ๋ณ๋ ํ๊ฐ๊ฐ ํ์ํฉ๋๋ค. | |
| ### ์ธ์ฉ | |
| ์ด ์์ ์ HRM-Text architecture์ training stack์ ๊ธฐ๋ฐ์ผ๋ก ํฉ๋๋ค. | |
| - ๋ ผ๋ฌธ: https://arxiv.org/html/2605.20613 | |
| - ์๋ณธ ์ฝ๋: https://github.com/sapientinc/HRM-Text | |