Muse-2-350M

Model Information

Muse-2-350M is an English text-generation language model developed by Muse-research. It is a compact causal language model built for general writing, question answering, science reasoning, lightweight coding help, summarization, and assistant-style generation.

The model uses the custom MuseNova Transformer architecture. MuseNova is an auto-regressive decoder-only language model with rotary position embeddings, grouped-query attention, RMSNorm, and a gated feed-forward network. The model is designed to keep a small footprint while preserving a long context window for documents, explanations, and multi-step prompts.

Model Developer: Muse-research

Model Architecture: MuseNova is a decoder-only Transformer architecture for causal language modeling. It uses grouped-query attention for efficient inference, RoPE for position encoding, and a SwiGLU-style MLP block.

Model	Params	Input modalities	Output modalities	Context length	GQA	Shared embeddings	Primary language
Muse-2-350M	354.98M	Text	Text and code	32,768 tokens	Yes	No	English

Supported Language: English.

Model Family: Muse-2 is a compact model family focused on efficient text generation and reasoning experiments. Muse-2-350M is the 350M parameter class release.

Status: This is a static model released as an offline checkpoint. It does not browse the web, retrieve live information, or update itself after release.

License: Apache 2.0.

Intended Use

Intended Use Cases: Muse-2-350M is intended for research and development use in English-language text generation. It can be used for lightweight assistant workflows, educational explanations, summarization, rewriting, coding support, science question answering, and evaluation of compact Transformer architectures.

The model is especially suited for:

general English text generation
assistant-style question answering
science and textbook-style explanations
summarization and rewriting
lightweight coding assistance
reasoning and multiple-choice evaluation research
long-context prompt experiments
custom PyTorch inference pipelines

Out of Scope: Muse-2-350M should not be used as the only source of truth for medical, legal, financial, safety-critical, or identity-sensitive decisions. It should not be treated as a live knowledge system, a safety classifier, or a substitute for expert review.

How to Use

This repository contains the Muse-2-350M tokenizer, configuration, and model weights. The model architecture is custom, so inference should be run with a MuseNova-compatible PyTorch implementation.

Install the common runtime packages:

pip install torch safetensors tokenizers

Load the tokenizer and weights:

from pathlib import Path

from safetensors.torch import load_file
from tokenizers import Tokenizer

model_dir = Path("Muse-2-350M")

tokenizer = Tokenizer.from_file(str(model_dir / "tokenizer.json"))
state_dict = load_file(str(model_dir / "model.safetensors"))

A compatible implementation should construct the MuseNovaForCausalLM architecture from config.json, then load model.safetensors.

Prompt Format

Muse-2-350M uses a simple role-style prompt format:

<|system|>
You are Muse-2, a helpful English assistant.<|end|>
<|user|>
Explain why the sky appears blue.<|end|>
<|assistant|>

For direct completion tasks, plain text prompts can also be used.

Generation Settings

For deterministic question answering:

generation_config = {
    "max_new_tokens": 256,
    "temperature": 0.0,
    "do_sample": False,
}

For general chat and writing:

generation_config = {
    "max_new_tokens": 512,
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 40,
    "do_sample": True,
}

For longer explanations:

generation_config = {
    "max_new_tokens": 1024,
    "temperature": 0.6,
    "top_p": 0.9,
    "do_sample": True,
}

Model Architecture

Muse-2-350M is a compact Transformer language model with a long context window and grouped-query attention.

Component	Value
Architecture	MuseNovaForCausalLM
Model type	Decoder-only causal language model
Parameters	354.98M
Hidden size	1024
Intermediate size	4352
Layers	18
Attention heads	16
Key/value heads	4
Position encoding	Rotary position embeddings
Context length	32,768 tokens
Vocabulary size	32,768
Normalization	RMSNorm
MLP	Gated feed-forward network
Weight format	safetensors

Grouped-query attention allows the model to use fewer key/value heads than query heads, reducing inference memory pressure while preserving multi-head attention behavior.

Capabilities

General Text Generation

Muse-2-350M can generate English prose, continue passages, rewrite text, summarize information, and answer direct questions. It is designed for compact assistant-style use rather than large-scale frontier reasoning.

Science and Knowledge Questions

The model can answer basic and intermediate science questions, explain concepts, and respond to textbook-style prompts. It can also be used for multiple-choice reasoning evaluations, although answers should be checked.

Math and Structured Reasoning

Muse-2-350M can attempt arithmetic, word problems, and step-by-step explanations. As a compact model, it may make mistakes on multi-step calculations or problems requiring precise symbolic manipulation.

Coding

The model can help with short code snippets, Python-style examples, explanations of programming concepts, and simple debugging. It is not specialized as a production coding model.

Long Context

Muse-2-350M supports a 32,768 token context window, enabling longer prompts, documents, and multi-part instructions. Long-context quality can vary depending on prompt structure and generation settings.

Benchmarks

Benchmark values are reported for transparent reference and should be interpreted as approximate automatic-evaluation signals. Small models can vary noticeably with prompt formatting and answer extraction.

Category	Benchmark	Split / Task	Metric	Score
Science reasoning	GPQA-Diamond public mirror	`fingertap/GPQA-Diamond`, test	Accuracy	28.79

The GPQA value above was produced with letter-scoring over 198 questions and saved in .eval_results/gpqa.yaml.

Hardware and Software

Muse-2-350M is small enough to run on a single modern GPU for comfortable inference. CPU inference is possible with a compatible implementation, but generation speed depends heavily on the runtime.

Precision	Approximate VRAM class	Notes
BF16 / FP16	2 GB+	Recommended for GPU inference
FP32	4 GB+	Useful for debugging, slower and larger
Quantized	Runtime dependent	Requires external quantization tooling

Recommended packages:

torch
safetensors
tokenizers

Data Scope

Muse-2-350M is intended for English-language generation and reasoning. It is not a multilingual model and should not be expected to provide strong performance outside English.

The model may reflect biases, errors, and omissions present in public text data. It does not contain a retrieval system and does not know about events after its offline data snapshot unless those facts are present in the prompt.

Responsibility and Safety

Muse-2-350M is a general-purpose text-generation model. Developers are responsible for testing it in their own use cases and applying appropriate safeguards.

Recommended deployment practices:

Evaluate the model on task-specific data before use.
Use human review for high-impact outputs.
Add content filters or policy layers where needed.
Avoid using raw model output as authoritative factual advice.
Monitor for hallucinations, unsafe completions, and prompt sensitivity.

Limitations

The model can hallucinate facts.
The model can make arithmetic, science, and coding mistakes.
The model may be sensitive to prompt wording.
The model can show answer-choice bias on multiple-choice tasks.
The model is not a live search or retrieval system.
The model is not specialized for medical, legal, financial, or safety-critical advice.
Long-context support does not guarantee perfect long-context reasoning.

Citation

@misc{museresearch2026muse2_350m,
  title        = {Muse-2-350M},
  author       = {Muse-research},
  year         = {2026},
  url          = {https://huggingface.co/Muse-research/Muse-2-350M}
}

Downloads last month: -

Safetensors

Model size

0.4B params

Tensor type

F32

Collection including Muse-research/Muse-2-350M

Muse 2 Family

Collection

1 item • Updated about 2 hours ago

Evaluation results

Diamond on Idavidrein/gpqa View evaluation results leaderboard

25.79