Kepler GGUF

Kepler is an 8B astrodynamics & quantitative-astrophysics reasoning model โ€” fine-tuned from Qwen/Qwen3-8B to answer orbital-mechanics and astrophysics word problems with a short worked chain and a single \boxed{} numeric answer. It is built for the operator who wants a local, private, $0-per-query numeric reasoner that runs entirely inside an NVIDIA DGX Spark (GB10, 128 GB unified memory) โ€” no API, no network, no per-token bill.

The differentiator is discipline, not size: an SFT pass on a verifier-checked corpus taught Kepler to answer rather than ruminate. It boxes a final answer on 100% of held-out problems with 0% truncation, at roughly 3ร— the conciseness of frontier cloud models on the same task (~166 output tokens vs ~460โ€“490). Every claim below is a measured run on the Spark, not a wishlist.

GGUF quantizations follow, recommended variant Q8_0 (effectively lossless).

Spark-tested

Per-variant accuracy on the held-out astro benchmark โ€” the quantization ladder. Scored with the same \boxed-extracting, SI-unit-normalized, ยฑ2%-relative-tolerance verifier the model was trained against (astro-bench v0.1, n=44 off-template problems, constants given in-prompt).

Variant Size Perplexity (wikitext-2) tok/s on Spark astro-bench v0.1 held-out (n=44, \boxed ยฑ2%)
Q4_K_M 4.7 GB โ€” โ€” 75.0%
Q5_K_M 5.5 GB โ€” โ€” 75.0%
Q6_K 6.3 GB โ€” โ€” 84.1%
Q8_0 8.2 GB โ€” โ€” 88.6%

Q8_0 is the recommended variant โ€” it preserves full-precision accuracy while halving the F16 footprint. Q4/Q5 lose ~11 pp on the hardest compositional rows (see Known drift).

How it stacks up

Kepler-Q8_0 against frontier cloud models on the same 44-row held-out, matched 4096-token budget, same \boxed ยฑ2% verifier (temp 0.6 / top_p 0.95):

Model Where it runs Accuracy Boxed Truncation Mean output tokens
Kepler-Q8_0 (8B) Local Spark, $0 84.1% 100% 0% 166
Claude Haiku 4.5 Cloud API 97.7% 100% 0% 488
Gemini 3.1 Flash-Lite Cloud API 95.5% 100% 0% 464

The honest read: a local 8B specialist lands ~11โ€“14 pp below frontier small cloud models on off-template numeric reasoning โ€” while running fully offline at zero marginal cost and answering ~3ร— more concisely. The format reliability (100% boxed, 0% truncation) matches the frontier; the gap is pure accuracy on a handful of multi-step rows. (Kepler's matched-budget 84.1% here vs the 88.6% fidelity number above is run-to-run sampling variance โ€” both land in the mid-to-high 80s.)

Variants

Variant Recommended use
Q4_K_M Smallest footprint; use when memory is tight and you can accept ~11 pp lower accuracy on hard rows.
Q5_K_M Slightly higher quality than Q4_K_M for a modest size bump.
Q6_K Near-lossless; a good middle ground if you have headroom.
Q8_0 Recommended. Effectively lossless โ€” best accuracy, fits the Spark envelope comfortably.

How to run

Pull the recommended variant:

huggingface-cli download Orionfold/Kepler-GGUF model-Q8_0.gguf \
  --local-dir ./models/kepler

Serve it via llama-server (OpenAI-compatible API):

llama-server -m ./models/kepler/model-Q8_0.gguf \
  -c 4096 -ngl 99 -t 8 \
  --host 0.0.0.0 --port 8080

Or run in-process via llama-cpp-python:

from llama_cpp import Llama
llm = Llama(
    model_path="./models/kepler/model-Q8_0.gguf",
    n_ctx=4096, n_gpu_layers=99, chat_format="chatml",
)
out = llm.create_chat_completion(
    messages=[{"role": "user", "content": "A satellite orbits Earth in a circular orbit at altitude 550 km. Compute its orbital period in minutes. Give your final answer as \\boxed{value unit}."}],
    temperature=0.6,
)
print(out["choices"][0]["message"]["content"])

LM Studio and Ollama (via a Modelfile) load the GGUF directly with no additional setup.

Known drift

Kepler is honest about where it misses. Across all quants, errors cluster on two families:

  • hohmann_transfer โ€” two-burn orbital transfers (the most multi-step problems).
  • altitude_from_period โ€” inverse Kepler (solving for orbital radius given the period).

These are an SFT coverage gap, not a precision artifact โ€” they fail similarly at every quant level and were flagged by the headroom analysis as needing more training coverage rather than reinforcement learning. Treat Kepler's answers on multi-burn transfer problems as draft-quality and verify them.

Companion benchmark

The exact benchmark used above is published as a dataset: Orionfold/Kepler-bench โ€” the problem pool + held-out set + the verifier-as-reward scorer, so you can reproduce these numbers.

Methods

Full methodology โ€” the scout, the verifier-is-the-reward bench, the SFT corpus, the SFT-vs-RLVR decision, and the Spark-side measurement protocol: The Gate Before the GPU โ€” Deciding SFT vs RL vs RLVR Before You Spend the Run.


Published by Orionfold LLC ยท orionfold.com ยท Methods documented at ainative.business/field-notes.

Downloads last month
109
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Orionfold/Kepler-GGUF

Finetuned
Qwen/Qwen3-8B
Quantized
(294)
this model