Instructions to use PaletLabs/Circe with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PaletLabs/Circe with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PaletLabs/Circe")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PaletLabs/Circe")
model = AutoModelForCausalLM.from_pretrained("PaletLabs/Circe")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PaletLabs/Circe with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PaletLabs/Circe"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PaletLabs/Circe",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PaletLabs/Circe

SGLang

How to use PaletLabs/Circe with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PaletLabs/Circe" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PaletLabs/Circe",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PaletLabs/Circe" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PaletLabs/Circe",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PaletLabs/Circe with Docker Model Runner:
```
docker model run hf.co/PaletLabs/Circe
```

ErnestoOjeda commited on May 5, 2025

Commit

e9fc8f7

verified ·

1 Parent(s): f075f6b

Update README.md

Browse files

Files changed (1) hide show

README.md +151 -3

README.md CHANGED Viewed

@@ -1,3 +1,151 @@
----
-license: mit
----

+---
+# 🪐 Circe-1.5B
+license: mit
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+  - bilingual
+  - lora
+  - rl
+  - cost-efficient
+  - tiny-models
+language:
+  - en
+  - es
+---
+<!-- center-aligned, capped at 420 px wide × 240 px tall -->
+<p align="center">
+  <img
+    src="https://cdn-uploads.huggingface.co/production/uploads/657e1ad01e3e9c41a49b732e/8IsJaxuOwuqBN0GctRUUe.png"
+    alt="Circe-1.5B schematic"
+    width="420"
+    height="240"
+  />
+</p>
+**Circe-1.5B** is a single-checkpoint, 1.5 B-parameter language model that asks a simple question:
+> _“How far can you push tiny models on a tiny budget?”_
+| ⚙️ Spec | Value |
+|---------|-------|
+| Base model | `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` |
+| Trainable params | 4 M (LoRA) |
+| Post-training cost | **≈ US $12** on 1×L40S |
+| Training recipe | 8 h SFT → 4 h GRPO |
+| Context length | up to **4 k tokens** (tested) |
+| RAM @ bf16 | ~9 GB (≤ 3 GB 4-bit GPTQ) |
+| Throughput | ~55 tok / s on 1×A6000 (fp16, no compile) |
+It keeps DeepSeek-R1’s strong reasoning depth but adds **fluent bilingual chat** (English & Spanish) in a checkpoint that fits on a laptop GPU.
+We intend to use it as a reproducible waypoint on the road to real-time speech-to-speech reasoning systems.
+---
+# 🔭 Intended Use
+* **Base for new LoRAs** — domain adaptation, longer-context studies.
+* **Research** into cost-efficient RL for reasoning.
+* **Not** for high-stakes or production tasks.
+See the [⚙️ Limitations](#️-limitations--bias) section before use.
+---
+# ⚡ Quickstart
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("PaletLabs/Circe-1.5B", torch_dtype="bfloat16")
+tok   = AutoTokenizer.from_pretrained("PaletLabs/Circe-1.5B")
+prompt = "<|user|>¿Cómo se dice “tiny model” en español?<|assistant|>"
+out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=64)
+print(tok.decode(out[0], skip_special_tokens=True))
+```
+---
+# 🛠️ Installation
+```bash
+git clone https://github.com/palet-global/circe
+cd circe
+python -m venv venv && source venv/bin/activate
+pip install .
+```
+## 🏗️ Re-Training Pipeline
+### Data
+```bash
+python data/fetch_datasets.py --out data/processed
+```
+### Supervised LoRA
+```bash
+accelerate config default            # one-time
+accelerate launch train/sft.py \
+  --data_dir data/processed \
+  --output_dir checkpoints/sft
+```
+### RL (GRPO)
+```bash
+accelerate launch train/rl_grpo.py \
+  --data_dir data/processed \
+  --output_dir checkpoints/grpo \
+  --init_ckpt checkpoints/sft/checkpoint-13000 \
+  --num_steps 3000 --save_steps 500 --group 4
+```
+### Merge and Tokenizer
+```bash
+python train/merge_lora.py \
+  --ckpt_dir checkpoints/grpo \
+  --base deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+```
+### SQuAD Sanity Checks
+```bash
+python eval/quick_squad_eval.py --model ./merged --dataset squad
+python eval/quick_squad_eval.py --model ./merged --dataset squad_es
+```
+### Upload
+```bash
+python train/upload_to_hub.py \
+  --model_dir merged \
+  --repo PaletLabs/Circe-1.5B \
+  --token $HF_TOKEN
+```
+---
+# 💻 Hardware & Inference Tips
+- **bf16 / fp16**: Needs ~9 GB VRAM.
+- **4-bit GPTQ**: < 3 GB; `bitsandbytes` works out-of-the-box.
+- Compile once (`torch.compile`) for **+10–15 %** throughput.
+---
+# ✍️ Current Evaluation Status
+Formal **lighteval / MMLU / GSM-8K** runs are queued. Preliminary spot-checks show Circe retains DeepSeek-R1’s chain-of-thought depth on reasoning-heavy QA while adding smooth bilingual generation.
+---
+## ⚙️ Limitations & Bias
+- No reward-model alignment — outputs may be unsafe or hallucinate.
+- Long-context (> 4 k) stability untested.
+- Training data bias from public QA pairs; Spanish coverage favors Latin-American variants.
+- Minimal safety filters — **you** must wrap with your own guardrails for production.
+---
+# 🔮 Roadmap
+- Publish full reasoning benchmark suite & eval scripts.
+- Release code-reasoning and doc-QA adapters.
+- Attach a **24 kHz neural codec** → real-time, full-duplex voice chat without ASR → TTS hops.
+---
+# 🪪 License
+This project is licensed under the [MIT](https://opensource.org/licenses/MIT) License. Attribution appreciated but not required.