Instructions to use jlzhou/Qwen2.5-3B-Infinity-Instruct-0625 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jlzhou/Qwen2.5-3B-Infinity-Instruct-0625 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jlzhou/Qwen2.5-3B-Infinity-Instruct-0625")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jlzhou/Qwen2.5-3B-Infinity-Instruct-0625")
model = AutoModelForCausalLM.from_pretrained("jlzhou/Qwen2.5-3B-Infinity-Instruct-0625")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use jlzhou/Qwen2.5-3B-Infinity-Instruct-0625 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jlzhou/Qwen2.5-3B-Infinity-Instruct-0625"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jlzhou/Qwen2.5-3B-Infinity-Instruct-0625",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jlzhou/Qwen2.5-3B-Infinity-Instruct-0625

SGLang

How to use jlzhou/Qwen2.5-3B-Infinity-Instruct-0625 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jlzhou/Qwen2.5-3B-Infinity-Instruct-0625" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jlzhou/Qwen2.5-3B-Infinity-Instruct-0625",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jlzhou/Qwen2.5-3B-Infinity-Instruct-0625" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jlzhou/Qwen2.5-3B-Infinity-Instruct-0625",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use jlzhou/Qwen2.5-3B-Infinity-Instruct-0625 with Docker Model Runner:
```
docker model run hf.co/jlzhou/Qwen2.5-3B-Infinity-Instruct-0625
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for Model ID

Model Details

This is the model fine-tuned in this blog.

This model is fine-tuned on Qwen/Qwen2.5-3B, with BAAI/Infinity-Instruct dataset (subset 0625). You can find more details in the blog post.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "jlzhou/Qwen2.5-3B-Infinity-Instruct-0625"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Training Details

Training Data

This model is trained on https://huggingface.co/datasets/BAAI/Infinity-Instruct

Training Hyperparameters

This model follows the recommended hyperparameters from https://huggingface.co/BAAI/Infinity-Instruct-3M-0625-Qwen2-7B#training-details

Speeds, Sizes, Times [optional]

[More Information Needed]

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	16.61
IFEval (0-Shot)	35.58
BBH (3-Shot)	26.91
MATH Lvl 5 (4-Shot)	2.04
GPQA (0-shot)	2.57
MuSR (0-shot)	8.13
MMLU-PRO (5-shot)	24.43

Downloads last month: 6

Safetensors

Model size

3B params

Tensor type

F16

Model tree for jlzhou/Qwen2.5-3B-Infinity-Instruct-0625

Base model

Qwen/Qwen2.5-3B

Finetuned

(408)

this model

Quantizations

2 models

Dataset used to train jlzhou/Qwen2.5-3B-Infinity-Instruct-0625

Evaluation results

strict accuracy on IFEval (0-Shot)
Open LLM Leaderboard

35.580
normalized accuracy on BBH (3-Shot)
Open LLM Leaderboard

26.910
exact match on MATH Lvl 5 (4-Shot)
Open LLM Leaderboard

2.040
acc_norm on GPQA (0-shot)
Open LLM Leaderboard

2.570
acc_norm on MuSR (0-shot)
Open LLM Leaderboard

8.130
accuracy on MMLU-PRO (5-shot)
test set Open LLM Leaderboard

24.430