Instructions to use choonok/VetJarvis-4B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use choonok/VetJarvis-4B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="choonok/VetJarvis-4B-Instruct") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("choonok/VetJarvis-4B-Instruct") model = AutoModelForImageTextToText.from_pretrained("choonok/VetJarvis-4B-Instruct") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use choonok/VetJarvis-4B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "choonok/VetJarvis-4B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "choonok/VetJarvis-4B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/choonok/VetJarvis-4B-Instruct
- SGLang
How to use choonok/VetJarvis-4B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "choonok/VetJarvis-4B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "choonok/VetJarvis-4B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "choonok/VetJarvis-4B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "choonok/VetJarvis-4B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use choonok/VetJarvis-4B-Instruct with Docker Model Runner:
docker model run hf.co/choonok/VetJarvis-4B-Instruct
VetJarvis-4B-Instruct
언어 / Language: 🇰🇷 한국어 · 🇺🇸 English
🆕 업데이트
- VetJarvis 1.1 버전이 업데이트되었습니다. 사후학습(Post-training) 파이프라인 개선을 통해 Reasoning 성능이 한층 강화되었습니다.
thinking모드 활성화 시, GPT-5.4-mini(no thinking)에 근접한 성능을 제공합니다.- 자세한 내용은 VetJarvis-1.1 모델카드를 참고해 주세요 → choonok/VetJarvis-1.1-4B-Instruct
🆕 Update
- VetJarvis 1.1 has been released — featuring significantly enhanced reasoning performance through an improved post-training pipeline.
- With
thinkingmode enabled, VetJarvis 1.1 delivers GPT-5.4-mini-class performance. - For details, please refer to the new repository → choonok/VetJarvis-1.1-4B-Instruct
1. Model Introduction
VetJarvis-4B-Instruct, a 4-billion-parameter domain-specific large language model (LLM) developed by CHOONOK COMPANY, is designed for companion animal(canine and feline) veterinary knowledge, education, and responsible AI research.
CHOONOK COMPANY, a Korea-based company, releases VetJarvis-4B-Instruct as a contribution to the veterinary research and education community, with the goal of supporting safe and responsible advancement of veterinary AI technologies.
- Research & education: freely available, no approval needed.
- Beyond research: we welcome broader use - including industry applications -
through a lightweight safety review, not a commercial gate.
- Clinical use: not permitted. VetJarvis is not a medical device and should not replace professional veterinary judgment.
We want this model to be genuinely useful - to researchers, educators, and eventually the broader veterinary ecosystem - while ensuring that its use remains responsible and expert-guided.
2. Model Configuration
Overview
- Model Name: VetJarvis-4B-Instruct
- Base Model: choonok/VetJarvis-4B-Base
- Architecture: Qwen3_5ForConditionalGeneration
- Number of Parameters: 4.2B (Language Model) + ~62M (MTP)
- Context Length: 8,192 tokens (SFT); 4,096 tokens (CPT stage)
- Training Hardware: NVIDIA B200 192GB × 4 (single node)
- Training Framework: NVIDIA Megatron-Bridge
- Knowledge Cutoff: Apr 2026
- Domain: Veterinary medicine
- Language: Korean, English
Training Pipeline
- Stage 1: Continual Pre-Training (CPT) --- Qwen/Qwen3.5-4B → VetJarvis-4B-Base
- Stage 2: Supervised Fine-Tuning (SFT) --- VetJarvis-4B-Base → VetJarvis-4B-Instruct
- MTP: Jointly trained during both CPT and SFT stages
Training Data
- VetJarvis-4B-Instruct was trained using veterinary reference materials and real-world clinical data. All clinical records were fully de-identified prior to use, with personal information of both animal guardians and veterinary personnel removed in accordance with data protection standards. General-domain Korean/English documents were also included to preserve the base model's general reasoning capabilities.
- Total Training Tokens: ~8.5B tokens
Training Hyperparameters
| Item | Value |
|---|---|
| Framework | NVIDIA Megatron-Bridge |
| Objective | Next-token prediction (causal LM) |
| Precision | BF16 mixed (FP32 master params + grads + Adam m/v) |
| Optimizer | Distributed Fused AdamW |
| LR Schedule | Cosine annealing |
| Max LR | 2e-5 |
| Min LR | 5e-6 |
| LR Warmup | 5% (≈ 863 iters) |
| Global Batch Size | 120 samples |
| Micro Batch Size | 3 |
| Sequence Length | 4,096 |
| Tokens per Step | 491,520 |
| Training Iterations | 17,264 |
| RNG Seed | 42 |
Parallelism & Sharding
| Item | Value |
|---|---|
| Tensor Parallel (TP) | 1 |
| Pipeline Parallel (PP) | 1 |
| Data Parallel (DP) | 4 |
| Context Parallel (CP) | 1 |
| Sequence Parallel | False |
| Optimizer Sharding | ZeRO-1 (data_parallel_sharding_strategy=optim) |
| Gradient Accumulation Fusion | True |
| Cross-Entropy Loss Fusion | True (native impl) |
| Attention Backend | Flash-Attention |
| Activation Recomputation | None |
Hardware & Compute
| Item | Value |
|---|---|
| GPU | NVIDIA B200 192GB × 4 (Blackwell) |
| Node | Single-node (NVLink 5th gen) |
| CPU | 96 cores |
| System RAM | 944 GB |
| CUDA Cores per GPU | 18,944 |
Compute & Throughput
| Item | Value |
|---|---|
| Wall-clock Training Time | ~28.65 hours (≈ 29h) |
| Total GPU-hours (measured) | 114.6 B200 GPU-hour (28.65h × 4 GPU) |
| Total FLOPs (estimated) | ~2.04 × 10²⁰ FLOPs (6ND: N=4B, D=8.49B) |
| Per-GPU Throughput (avg) | ~494 TFLOP/s (BF16) |
| Model FLOPs Utilization (MFU) | ~22% |
| A100 Equivalent GPU-hour | ~454 A100 GPU-hour (BF16 312 TFLOP/s, MFU 40% assumed) |
3. Quickstart
Serving VetJarvis-4B-Instruct
VetJarvis-4B-Instruct is a text-only LLM based on Qwen3.5-4B, and can be served in BF16 using a variety of frameworks including Hugging Face Transformers and vLLM.
Hugging Face Transformers
Qwen3.5 architecture is natively supported from transformers>=5.5.
pip install -U "transformers>=5.5" accelerate torch flash-attn
import torch
from transformers import AutoTokenizer, Qwen3_5ForConditionalGeneration
MODEL = "choonok/VetJarvis-4B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = Qwen3_5ForConditionalGeneration.from_pretrained(
MODEL,
dtype=torch.bfloat16,
device_map="auto",
attn_implementation="flash_attention_2",
)
model.eval()
messages = [
{
"role": "system",
"content": "You are 'VetJarvis', a veterinarian-only AI assistant developed by CHOONOK COMPANY.",
},
{"role": "user", "content": "Please tell me the metronidazole protocol."},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=4096,
temperature=0.8,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(
outputs[0][inputs.input_ids.shape[-1]:],
skip_special_tokens=True,
))
On Blackwell GPUs (RTX 5090 / B200), a compatible build of flash-attn ≥ 2.7 is required. If you encounter build or compatibility issues, fall back to attn_implementation="sdpa".
vLLM
Recommended for production or high-throughput use cases. vllm>=0.18 supports the Qwen3.5 architecture, and the jointly trained MTP layer can be used for speculative decoding to improve throughput.
pip install "vllm>=0.18"
Launch an OpenAI-compatible server:
vllm serve choonok/VetJarvis-4B-Instruct \
--served-model-name VetJarvis-4B-Instruct \
--port 8000 \
--max-model-len 8192 \
--dtype bfloat16 \
--gpu-memory-utilization 0.85 \
--speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
Example:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
)
response = client.chat.completions.create(
model="VetJarvis-4B-Instruct",
messages=[
{
"role": "system",
"content": "You are 'VetJarvis', a veterinarian-only AI assistant developed by CHOONOK COMPANY.",
},
{
"role": "user",
"content": "Please tell me the metronidazole protocol.",
},
],
max_tokens=4096,
temperature=0.8,
top_p=0.9,
extra_body={
"chat_template_kwargs": {"enable_thinking": False},
},
)
print(response.choices[0].message.content)
Recommended Parameters
| Parameter | Value |
|---|---|
| Temperature | 0.8 |
| Top-p | 0.9 |
| max_new_tokens | 4,096 |
| enable_thinking | False (default) |
| Context length | ≤ 262,144 tokens |
4. Evaluation Results
Main Benchmarks
To ensure the transparency and reproducibility of the evaluation, the benchmark for this model utilizes data from the 73rd to 77th (2022–2026) Japanese Veterinary National Examination, currently the only veterinary licensing exam globally that officially publishes its past questions. We constructed an evaluation subset (n=132) by filtering for questions directly related to small animal (canine and feline) clinical practice from Theory Exam A/B. For multiple-answer (all-correct) questions, the original grading criteria of the examining body were strictly observed.
The comparison targets include, in addition to our VetJarvis model, GPT-5-mini (reasoning effort: minimal/medium), Qwen3.5-4B (base model), Gemma-4-E2B / E4B-it, and EXAONE-3.5-7.8B-Instruct.
Figure 1. Parameter size vs accuracy
Figure 1 compares mean accuracy against parameter count (log scale) on the small-animal clinical subset (n=132) of the Japanese Veterinary National Examination benchmark. The x-axis denotes model parameter count (1B–10B+, log scale), the y-axis denotes mean accuracy (%), and error bars indicate per-exam variance.
- The VetJarvis-4B family reaches 77.88% (standard) and 81.67% (think mode) at the 4B scale, outperforming similarly sized open local models Qwen3.5-4B (62.58%) and Gemma-E4B (56.97%, effective 4.5B) by more than 15–25 percentage points.
- It also exceeds EXAONE-3.5-7.8B (47.58%)—a model 1.7–2× its size—by over 30 percentage points, clearly departing from the local-model trend line.
- Compared to closed models, the gap versus GPT-5.4-mini (86.82%) and GPT-5.4-mini · think (93.94%) is only about 5–9 percentage points, suggesting that VetJarvis-4B approaches a comparable performance envelope despite the 2×+ difference in parameter scale.
In short, VetJarvis-4B clearly demonstrates the value of veterinary-domain-specialized training relative to open local models at the same parameter scale, and enabling think mode yields additional accuracy gains.
Figure 2. Parameter size vs accuracy
Figure 2 presents the same evaluation as Figure 1 in a detailed table ranked by accuracy across eight models, reporting per-session accuracy for the 73rd–77th Japanese Veterinary National Exams, mean accuracy, standard deviation (σ), min–max range, and gap versus GPT-5.4-mini (think mode) (Δ vs GPT-5.4-mini-think).
- The top two ranks are taken by the closed models GPT-5.4-mini (medium thinking) at 93.94% and GPT-5.4-mini (no thinking) at 86.82%.
- VetJarvis-4B-THINK-IT (thinking ON) at 81.67% ranks 3rd overall, and thinking OFF at 77.88% ranks 4th, with gaps to GPT-5.4-mini (think) of -12.27 pp and -16.06 pp respectively.
- Against the same 4B-scale Qwen3.5-4B (62.58%), VetJarvis-4B leads by +19.09 pp (think ON) and +15.30 pp (think OFF).
- Other open local models trail by wide margins relative to GPT-5.4-mini (think): Gemma-3n-E4B-it 56.97% (-36.97 pp), EXAONE-3.5-7.8B 47.58% (-46.36 pp), and Gemma-3n-E2B-it 43.18% (-50.76 pp).
- The per-session standard deviation (σ) of VetJarvis-4B (1.73–2.17) is comparable to GPT-5.4-mini (no thinking, 1.98), indicating that stability across exam sessions is on par with closed models.
In summary, VetJarvis-4B records the highest accuracy among 4B-parameter open local models, narrowing the gap to leading closed models to a 10–16 pp range and reaffirming the practical effectiveness of veterinary-domain-specialized training.
Summary
The VetJarvis-4B-Instruct model consistently outperforms comparable open local models (Qwen3.5-4B, Gemma-4-E4B, EXAONE-3.5-7.8B) across hematology/oncology (98%), orthopedics (89%), endocrinology (88%), infectious diseases/zoonoses (78%), and respiratory (74%) domains, with margins ranging from 5 to 55 percentage points. It shows particular strength in histopathological classification, differential diagnosis of non-erosive arthritis, judgment on procedural indications (e.g., lobectomy for lung lobe torsion), and clinical decision-making that incorporates regional epidemiological data (such as the current status of feline-to-human zoonoses in Japan).
5. Limitation & License
Limitations
Not a Medical Device : This model is not a medical device evaluated or approved by any regulatory authority.
Prohibition of Clinical Use : The model's outputs cannot replace clinical decision-making, such as diagnosis, prescription, or treatment for actual animal patients, and direct use for such purposes is strictly prohibited.
Hallucination : Due to the technical nature of generative AI, the model may produce factually incorrect or veterinary-inaccurate information (hallucinations). Blind trust in the model's outputs is dangerous, and cross-verification by veterinary professionals is essential.
Knowledge Cutoff Limitations: As this model was trained on data up to a specific point in time (Knowledge Cutoff), it may not reflect the latest veterinary research papers, new drug information, or developments published thereafter.
Disclaimer of Liability: CHOONOK COMPANY assumes no legal liability for any decisions made by the licensee utilizing the model's outputs, nor for any direct or indirect damages resulting from such use.
License
Applicable License: This model is governed by the CHOONOK COMPANY VetJarvis Model License Agreement 1.0 - NC.
Non-Commercial Research & Education Only: Commercial use for revenue generation is prohibited; use is permitted solely for veterinary academic research, education, and technical evaluation.
Approval Process: For uses beyond the Permitted Use, users must obtain prior written approval from CHOONOK COMPANY based on a 'safety and appropriateness review' rather than commercial criteria.
Detailed Terms: Users must fully read and understand the complete text of the LICENSE before use.
6. Contact
- Email: admin@choonokcompany.com
- Downloads last month
- 1,173

