Instructions to use EpistemeAI/Reason-Medical-20b-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EpistemeAI/Reason-Medical-20b-4bit with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EpistemeAI/Reason-Medical-20b-4bit")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("EpistemeAI/Reason-Medical-20b-4bit")
model = AutoModelForMultimodalLM.from_pretrained("EpistemeAI/Reason-Medical-20b-4bit")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use EpistemeAI/Reason-Medical-20b-4bit with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EpistemeAI/Reason-Medical-20b-4bit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EpistemeAI/Reason-Medical-20b-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EpistemeAI/Reason-Medical-20b-4bit

SGLang

How to use EpistemeAI/Reason-Medical-20b-4bit with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EpistemeAI/Reason-Medical-20b-4bit" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EpistemeAI/Reason-Medical-20b-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EpistemeAI/Reason-Medical-20b-4bit" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EpistemeAI/Reason-Medical-20b-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use EpistemeAI/Reason-Medical-20b-4bit with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EpistemeAI/Reason-Medical-20b-4bit to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EpistemeAI/Reason-Medical-20b-4bit to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EpistemeAI/Reason-Medical-20b-4bit to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="EpistemeAI/Reason-Medical-20b-4bit",
    max_seq_length=2048,
)

Docker Model Runner
How to use EpistemeAI/Reason-Medical-20b-4bit with Docker Model Runner:
```
docker model run hf.co/EpistemeAI/Reason-Medical-20b-4bit
```

Introduction

This EpistemeAI/Reasoning-Medical-20B is designed for advanced medical reasoning in professional medicine, medical genetics, college biology/medicine, and clinical knowledge. The model was fine-tuned on a large-scale dataset of 370,000 high-quality question-and-answer examples, incorporating Chain-of-Thought reasoning to improve step-by-step problem solving. Training was performed using the SFT (Supervised fine tuning) trainer with the Unsloth optimization method for efficient fine-tuning.

Reasoning-Medical-20B

Model Summary

Reasoning-Medical-20B is a medical reasoning language model fine-tuned from openai/gpt-oss-20b. The model is designed for biomedical question answering, medical reasoning research, clinical knowledge evaluation, and safety-aligned medical assistant experiments.

This model is intended for research and development use only. It is not intended to directly provide clinical diagnosis, treatment decisions, medication dosing, patient management instructions, or emergency medical guidance.

Model Type

This model is a decoder-only Transformer causal language model based on openai/gpt-oss-20b.

The base architecture uses:

model_type: gpt_oss
architectures: GptOssForCausalLM
Task type: causal language modeling / text generation
Architecture family: sparse Mixture-of-Experts language model
Base model: openai/gpt-oss-20b

Intended Use

This model may be useful for:

Biomedical research question answering
Medical exam-style reasoning
Medical benchmark evaluation
Clinical knowledge retrieval experiments
Differential diagnosis reasoning research
Medical education support
Safety-aligned medical AI research

Out-of-Scope Use

This model should not be used for:

Direct clinical diagnosis
Direct patient treatment planning
Medication dosage recommendations
Emergency medical decision-making
Replacing a licensed medical professional
Autonomous clinical triage
High-stakes patient management without expert review

All outputs should be treated as preliminary, require independent verification, and should be reviewed by qualified medical professionals before any real-world clinical application.

Training Details

The model was fine-tuned from openai/gpt-oss-20b using a medical reasoning dataset containing high-quality question-answer examples.

Training may include one or more of the following stages:

Supervised fine-tuning on 370,000 high-quality medical question-answer examples

Training Data

Training Dataset

This model was fine-tuned using:

Dataset: lingshu-medical-mllm/ReasonMed
Dataset type: medical reasoning question-answer dataset
Language: English
License: Apache-2.0
Task categories: question answering and text generation
Domain: medical reasoning, biomedical knowledge, clinical-style QA
Format: question-answer examples with reasoning-oriented responses

ReasonMed is a large-scale medical reasoning dataset containing high-quality medical question-answer examples with multi-step reasoning rationales and concise answer summaries. It was generated and curated through a multi-agent pipeline designed to improve correctness, logical coherence, and medical factuality.

Safety Alignment

This model should be aligned to prefer responses that:

Express uncertainty when appropriate
Avoid unsupported medical claims
Recommend professional medical consultation for serious symptoms
Avoid definitive diagnosis without sufficient context
Avoid medication dosage or prescription advice
Refuse unsafe medical, biological, or harmful instructions
Provide safe, educational, and non-actionable alternatives when needed

Safety tuning may include DPO-style preference pairs where the chosen answer is safer, more cautious, and more clinically appropriate than the rejected answer.

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "EpistemeAI/Reason-Medical-20b-4bit"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {
        "role": "system",
        "content": "You are a careful medical reasoning assistant. Provide educational information only. Do not provide definitive diagnosis or treatment."
    },
    {
        "role": "user",
        "content": "How can bacterial pneumonia be differentiated from viral pneumonia?"
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=512,
    temperature=0.2,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Recommended Prompt Format

Reasoning: high.

You are a careful medical reasoning assistant. Your response is for educational and research purposes only. Do not provide a final clinical diagnosis, prescription, dosage, or treatment plan. Explain uncertainty, red flags, and when to seek professional medical care.

Question:
{user_question}

Evaluation

The model should be evaluated on both capability and safety benchmarks.

Suggested evaluation categories:

Category	Example Benchmarks
Medical QA	MedQA, MedMCQA, PubMedQA
Biomedical reasoning	MMLU medical subsets, MMLU-Pro biology/medicine
Clinical safety	Custom unsafe-medical-advice tests
Hallucination	Citation and factuality checks
Refusal behavior	Unsafe medical, biosecurity, and self-harm prompts
Calibration	Uncertainty and confidence evaluation

Current reported results:

Benchmark	R-M-20B	gpt-20b
MedQA	67	62
HealthBench	42.5	42.5

Limitations

This model may:

Produce incorrect or hallucinated medical information
Overstate confidence
Miss rare diagnoses or uncommon clinical presentations
Fail to identify emergency symptoms
Provide incomplete differential diagnoses
Reflect biases present in the training data
Perform differently across demographic groups, specialties, and clinical contexts

The model is not a substitute for professional medical judgment.

Medical Disclaimer

The outputs generated by this model are not intended to directly inform clinical diagnosis, patient management decisions, treatment recommendations, or any other direct clinical practice application. Performance benchmarks highlight baseline capabilities on relevant tasks, but inaccurate model output is possible. All outputs should be considered preliminary and require independent verification, clinical correlation, and further investigation through established research and development methodologies.

If you are experiencing a medical emergency, contact emergency services or a qualified healthcare professional immediately.

Ethical Considerations

Users should carefully consider the risks of deploying medical AI systems in real-world settings. This model should be used with human oversight, transparent limitations, evaluation against clinically relevant safety tests, and appropriate governance.

Developers should avoid using the model in workflows where incorrect outputs could directly harm patients.

Citation

If you use this model, please cite the base model and this fine-tuned model.

@misc{reasoningmedical20b,
  title = {Reasoning-Medical-20B},
  author = {EpistemeAI},
  year = {2026},
  publisher = {Hugging Face},
  note = {Fine-tuned from openai/gpt-oss-20b}
}

License

This model is released under the Apache-2.0 license unless otherwise specified by the fine-tuning data, adapter weights, or downstream distribution requirements.

Users are responsible for ensuring that their use complies with the base model license, dataset licenses, and applicable laws or regulations.

Contact

For questions, issues, or research collaboration, contact:

Organization: EpistemeAI
Hugging Face: EpistemeAI

Uploaded finetuned model

Developed by: EpistemeAI
License: apache-2.0
Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit

This gpt_oss model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: 172

Safetensors

Model size

22B params

Tensor type

BF16

Model tree for EpistemeAI/Reason-Medical-20b-4bit

Base model

openai/gpt-oss-20b

Quantized

unsloth/gpt-oss-20b-unsloth-bnb-4bit

Quantized

(205)

this model