Instructions to use AIDC-AI/Marco-o1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AIDC-AI/Marco-o1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AIDC-AI/Marco-o1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1") model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-o1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AIDC-AI/Marco-o1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AIDC-AI/Marco-o1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIDC-AI/Marco-o1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AIDC-AI/Marco-o1
- SGLang
How to use AIDC-AI/Marco-o1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AIDC-AI/Marco-o1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIDC-AI/Marco-o1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AIDC-AI/Marco-o1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIDC-AI/Marco-o1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AIDC-AI/Marco-o1 with Docker Model Runner:
docker model run hf.co/AIDC-AI/Marco-o1
Nice try, but the model is just not what it pretends to be...
It answers the question "How many 'r's are there in the word strawberry?" perfectly even with no reasoning. I like this test question for its simplicity and the fact that even large models struggle with it. Your 7B model gives the correct answer even without reasoning, however, when we slightly change the question to "How many 'e's are there in the word blueberry?", it gives a wrong answer even when we ask for reasoning which is a direct proof that this model is not what it pretends to be.
Thank you for your attention.
We tried the case you mentioned. We found that when using greedy decoding, the model indeed incorrectly remembers the word "blueberry." The actual calculated sequence of letters is "b, l, u, e, e, r, b, e, r, r, y," with a total of 3 'e's(and is the correct answer). This indicates that there are still some flaws in the overall reasoning of the model.
But we attempted to explore the PASS@K accuracy using a temperature of 0.7. We found that in 2 attempts, the model was able to output the correct answer each time.
Below are the screenshots of our outputs.

