Instructions to use junzzhu/atoMixtral-58K-5x5-DigitMesh with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use junzzhu/atoMixtral-58K-5x5-DigitMesh with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="junzzhu/atoMixtral-58K-5x5-DigitMesh")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("junzzhu/atoMixtral-58K-5x5-DigitMesh")
model = AutoModelForCausalLM.from_pretrained("junzzhu/atoMixtral-58K-5x5-DigitMesh")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use junzzhu/atoMixtral-58K-5x5-DigitMesh with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "junzzhu/atoMixtral-58K-5x5-DigitMesh"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "junzzhu/atoMixtral-58K-5x5-DigitMesh",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/junzzhu/atoMixtral-58K-5x5-DigitMesh

SGLang

How to use junzzhu/atoMixtral-58K-5x5-DigitMesh with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "junzzhu/atoMixtral-58K-5x5-DigitMesh" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "junzzhu/atoMixtral-58K-5x5-DigitMesh",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "junzzhu/atoMixtral-58K-5x5-DigitMesh" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "junzzhu/atoMixtral-58K-5x5-DigitMesh",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use junzzhu/atoMixtral-58K-5x5-DigitMesh with Docker Model Runner:
```
docker model run hf.co/junzzhu/atoMixtral-58K-5x5-DigitMesh
```

AtoMixtral-58K-5x5-DigitMesh

A minimal 58K parameter Mixture-of-Experts (MoE) model for 5×5 digit mesh recognition, built on the MixtralForCausalLM architecture.

Model Description

AtoMixtral-58K-5x5-DigitMesh is an ultra-lightweight MoE causal language model for efficient digit recognition from 5×5 binary mesh patterns. With only 58K parameters and 2 experts, this "atom-sized" MoE model demonstrates effective pattern recognition with sparse expert activation.

Key Specifications

Architecture: MixtralForCausalLM (Mixture-of-Experts)
Parameters: ~58K
Experts: 2 local experts, 1 active per token
Input: 5×5 binary mesh (25 tokens)
Output: Digit tokens (D0-D9)
Vocabulary Size: 14 tokens
Context Length: 32 tokens
Hidden Size: 32, Layers: 2, Attention Heads: 4

Quick Start

Serving with vLLM

python -m vllm.entrypoints.openai.api_server \
  --model junzzhu/atoMixtral-58K-5x5-DigitMesh \
  --max-model-len 32

Test Example

curl http://localhost:8000/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "junzzhu/atoMixtral-58K-5x5-DigitMesh",
    "prompt": "1 1 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 <SEP>",
    "max_tokens": 1,
    "temperature": 0
  }'

Expected output: D7

Input Format

25 space-separated binary values (0 or 1) representing a 5×5 grid, followed by <SEP>:

[5 values] [5 values] [5 values] [5 values] [5 values] <SEP>

Use Cases

MoE architecture research at minimal scale
Educational demonstrations of sparse expert models
Resource-constrained digit recognition
Pattern recognition proof-of-concepts

License

Apache-2.0

Downloads last month: 9

Safetensors

Model size

58.5k params

Tensor type

F32