Kimina Prover
Collection
State-of-the-Art Models for Formal Mathematical Reasoning https://huggingface.co/blog/AI-MO/kimina-prover β’ 8 items β’ Updated β’ 10
How to use AI-MO/Kimina-Prover-72B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="AI-MO/Kimina-Prover-72B")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("AI-MO/Kimina-Prover-72B")
model = AutoModelForCausalLM.from_pretrained("AI-MO/Kimina-Prover-72B")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use AI-MO/Kimina-Prover-72B with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AI-MO/Kimina-Prover-72B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AI-MO/Kimina-Prover-72B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/AI-MO/Kimina-Prover-72B
How to use AI-MO/Kimina-Prover-72B with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "AI-MO/Kimina-Prover-72B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AI-MO/Kimina-Prover-72B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "AI-MO/Kimina-Prover-72B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AI-MO/Kimina-Prover-72B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use AI-MO/Kimina-Prover-72B with Docker Model Runner:
docker model run hf.co/AI-MO/Kimina-Prover-72B
Kimina-Prover-72B is a theorem proving model developed by Project Numina and Kimi teams, focusing on competition style problem solving capabilities in Lean 4. It is trained via large scale reinforcement learning. It achieves 84.0% accuracy with Pass@32 on MiniF2F-test.
For advanced usage examples, see https://github.com/MoonshotAI/Kimina-Prover-Preview/tree/master/kimina_prover_demo
You can easily do inference using vLLM:
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_name = "AI-MO/Kimina-Prover-72B"
model = LLM(
model=model_name,
tensor_parallel_size=8, # Should have 8 GPUs on this node
max_model_len=131072,
)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
problem = "The volume of a cone is given by the formula $V = \frac{1}{3}Bh$, where $B$ is the area of the base and $h$ is the height. The area of the base of a cone is 30 square units, and its height is 6.5 units. What is the number of cubic units in its volume?"
formal_statement = """import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
/-- The volume of a cone is given by the formula $V = \frac{1}{3}Bh$, where $B$ is the area of the base and $h$ is the height. The area of the base of a cone is 30 square units, and its height is 6.5 units. What is the number of cubic units in its volume? Show that it is 65.-/
theorem mathd_algebra_478 (b h v : β) (hβ : 0 < b β§ 0 < h β§ 0 < v) (hβ : v = 1 / 3 * (b * h))
(hβ : b = 30) (hβ : h = 13 / 2) : v = 65 := by
"""
prompt = "Think about and solve the following problem step by step in Lean 4."
prompt += f"\n# Problem:{problem}"""
prompt += f"\n# Formal statement:\n```lean4\n{formal_statement}\n```\n"
messages = [
{"role": "system", "content": "You are an expert in mathematics and proving theorems in Lean 4."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=8096)
output = model.generate(text, sampling_params=sampling_params)
output_text = output[0].outputs[0].text
print(output_text)