Instructions to use etanlightstone/simple-lm-sft-science with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use etanlightstone/simple-lm-sft-science with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="etanlightstone/simple-lm-sft-science", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("etanlightstone/simple-lm-sft-science", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use etanlightstone/simple-lm-sft-science with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "etanlightstone/simple-lm-sft-science"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "etanlightstone/simple-lm-sft-science",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/etanlightstone/simple-lm-sft-science

SGLang

How to use etanlightstone/simple-lm-sft-science with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "etanlightstone/simple-lm-sft-science" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "etanlightstone/simple-lm-sft-science",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "etanlightstone/simple-lm-sft-science" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "etanlightstone/simple-lm-sft-science",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use etanlightstone/simple-lm-sft-science with Docker Model Runner:
```
docker model run hf.co/etanlightstone/simple-lm-sft-science
```

SimpleLM SFT

Custom decoder-only Transformer, supervised-fine-tuned on the MegaScience corpus for science question answering. Architecture is defined in modeling_simple_lm.py (bundled in this repo) and loaded via trust_remote_code=True.

SFT source checkpoint: models/sft_full_science.pt
Pretraining checkpoint: /home/etan/simple_llm/checkpoints/lm_checkpoint_008_shutdown.pt
Training data: /home/etan/simple_llm/datasets/MegaScience/data
subject_filter: None
subject_exclude: ['math']
question_regex_filter: None
SFT epochs: 1 at learning_rate 3e-05

Prompt format

This model was fine-tuned on a single fixed prompt template -- queries that don't match it will produce noticeably worse output. The packaged chat_template.jinja reproduces this format, so you can use tokenizer.apply_chat_template(...) directly and get byte-identical strings to what the model saw during training:

Question: What is photosynthesis?
Answer: <answer></s>

Equivalently, with the chat template:

tokenizer.apply_chat_template(
    [{"role": "user", "content": "What is photosynthesis?"}],
    add_generation_prompt=True, tokenize=False,
)
# -> 'Question: What is photosynthesis?\nAnswer: '

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "etanlightstone/simple-lm-sft-science"
tok   = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True).eval()

messages = [{"role": "user", "content": "What is photosynthesis?"}]
inputs = tok.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
)
prompt_len = inputs["input_ids"].shape[1]
with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.4,
        top_p=0.9,
        repetition_penalty=1.1,
    )
answer = tok.decode(out[0, prompt_len:], skip_special_tokens=True)
print(answer)

Architecture

field	value
vocab_size	32000
context_length	512
d_model	768
n_layers	12
n_heads	8
d_ff	2048
activation	gelu
bias	True
tie_word_embeddings	True

Tokenizer source: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Training settings

{
  "mode": "sft",
  "source_pretrain_checkpoint": "/home/etan/simple_llm/checkpoints/lm_checkpoint_008_shutdown.pt",
  "source_pretrain_train_settings": {
    "batch_size": 10,
    "batch_size_note": "per GPU when using torchrun",
    "world_size": 1,
    "learning_rate": 0.0003,
    "weight_decay": 0.01,
    "num_epochs": 3,
    "max_steps": null,
    "grad_clip": 1.0,
    "seed": 42,
    "docs_dir": "/home/etan/simple_llm/docs",
    "block_size": 512,
    "stride": 448,
    "stride_overlap_tokens": 64
  },
  "data_dir": "/home/etan/simple_llm/datasets/MegaScience/data",
  "data_glob": "*.parquet",
  "subject_filter": null,
  "subject_exclude": [
    "math"
  ],
  "question_regex_filter": null,
  "batch_size": 10,
  "world_size": 1,
  "learning_rate": 3e-05,
  "min_lr": 3e-06,
  "warmup_steps": 200,
  "weight_decay": 0.0,
  "num_epochs": 1,
  "max_steps": null,
  "grad_clip": 1.0,
  "seed": 42,
  "block_size": 512,
  "eval_fraction": 0.005,
  "eval_every": 500,
  "max_train_examples": null,
  "freezing": {
    "freeze_embeddings": false,
    "freeze_lm_head": false,
    "freeze_blocks_below": 0,
    "tie_word_embeddings": true,
    "trainable_params": 91138560,
    "total_params": 91138560,
    "frozen_params": 0,
    "frozen_blocks": 0,
    "total_blocks": 12
  },
  "prompt_template": "Question: {question}\nAnswer: ",
  "completion_suffix": "</s>"
}

Downloads last month: 29

Safetensors

Model size

91.1M params

Tensor type

F32