Muse-2-350M
Model Information
Muse-2-350M is an English text-generation language model developed by Muse-research. It is a compact causal language model built for general writing, question answering, science reasoning, lightweight coding help, summarization, and assistant-style generation.
The model uses the custom MuseNova Transformer architecture. MuseNova is an auto-regressive decoder-only language model with rotary position embeddings, grouped-query attention, RMSNorm, and a gated feed-forward network. The model is designed to keep a small footprint while preserving a long context window for documents, explanations, and multi-step prompts.
Model Developer: Muse-research
Model Architecture: MuseNova is a decoder-only Transformer architecture for causal language modeling. It uses grouped-query attention for efficient inference, RoPE for position encoding, and a SwiGLU-style MLP block.
| Model | Params | Input modalities | Output modalities | Context length | GQA | Shared embeddings | Primary language |
|---|---|---|---|---|---|---|---|
| Muse-2-350M | 354.98M | Text | Text and code | 32,768 tokens | Yes | No | English |
Supported Language: English.
Model Family: Muse-2 is a compact model family focused on efficient text generation and reasoning experiments. Muse-2-350M is the 350M parameter class release.
Status: This is a static model released as an offline checkpoint. It does not browse the web, retrieve live information, or update itself after release.
License: Apache 2.0.
Intended Use
Intended Use Cases: Muse-2-350M is intended for research and development use in English-language text generation. It can be used for lightweight assistant workflows, educational explanations, summarization, rewriting, coding support, science question answering, and evaluation of compact Transformer architectures.
The model is especially suited for:
- general English text generation
- assistant-style question answering
- science and textbook-style explanations
- summarization and rewriting
- lightweight coding assistance
- reasoning and multiple-choice evaluation research
- long-context prompt experiments
- custom PyTorch inference pipelines
Out of Scope: Muse-2-350M should not be used as the only source of truth for medical, legal, financial, safety-critical, or identity-sensitive decisions. It should not be treated as a live knowledge system, a safety classifier, or a substitute for expert review.
How to Use
This repository contains the Muse-2-350M tokenizer, configuration, and model weights. The model architecture is custom, so inference should be run with a MuseNova-compatible PyTorch implementation.
Install the common runtime packages:
pip install torch safetensors tokenizers
Load the tokenizer and weights:
from pathlib import Path
from safetensors.torch import load_file
from tokenizers import Tokenizer
model_dir = Path("Muse-2-350M")
tokenizer = Tokenizer.from_file(str(model_dir / "tokenizer.json"))
state_dict = load_file(str(model_dir / "model.safetensors"))
A compatible implementation should construct the MuseNovaForCausalLM architecture from config.json, then load model.safetensors.
Prompt Format
Muse-2-350M uses a simple role-style prompt format:
<|system|>
You are Muse-2, a helpful English assistant.<|end|>
<|user|>
Explain why the sky appears blue.<|end|>
<|assistant|>
For direct completion tasks, plain text prompts can also be used.
Generation Settings
For deterministic question answering:
generation_config = {
"max_new_tokens": 256,
"temperature": 0.0,
"do_sample": False,
}
For general chat and writing:
generation_config = {
"max_new_tokens": 512,
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40,
"do_sample": True,
}
For longer explanations:
generation_config = {
"max_new_tokens": 1024,
"temperature": 0.6,
"top_p": 0.9,
"do_sample": True,
}
Model Architecture
Muse-2-350M is a compact Transformer language model with a long context window and grouped-query attention.
| Component | Value |
|---|---|
| Architecture | MuseNovaForCausalLM |
| Model type | Decoder-only causal language model |
| Parameters | 354.98M |
| Hidden size | 1024 |
| Intermediate size | 4352 |
| Layers | 18 |
| Attention heads | 16 |
| Key/value heads | 4 |
| Position encoding | Rotary position embeddings |
| Context length | 32,768 tokens |
| Vocabulary size | 32,768 |
| Normalization | RMSNorm |
| MLP | Gated feed-forward network |
| Weight format | safetensors |
Grouped-query attention allows the model to use fewer key/value heads than query heads, reducing inference memory pressure while preserving multi-head attention behavior.
Capabilities
General Text Generation
Muse-2-350M can generate English prose, continue passages, rewrite text, summarize information, and answer direct questions. It is designed for compact assistant-style use rather than large-scale frontier reasoning.
Science and Knowledge Questions
The model can answer basic and intermediate science questions, explain concepts, and respond to textbook-style prompts. It can also be used for multiple-choice reasoning evaluations, although answers should be checked.
Math and Structured Reasoning
Muse-2-350M can attempt arithmetic, word problems, and step-by-step explanations. As a compact model, it may make mistakes on multi-step calculations or problems requiring precise symbolic manipulation.
Coding
The model can help with short code snippets, Python-style examples, explanations of programming concepts, and simple debugging. It is not specialized as a production coding model.
Long Context
Muse-2-350M supports a 32,768 token context window, enabling longer prompts, documents, and multi-part instructions. Long-context quality can vary depending on prompt structure and generation settings.
Benchmarks
Benchmark values are reported for transparent reference and should be interpreted as approximate automatic-evaluation signals. Small models can vary noticeably with prompt formatting and answer extraction.
| Category | Benchmark | Split / Task | Metric | Score |
|---|---|---|---|---|
| Science reasoning | GPQA-Diamond public mirror | fingertap/GPQA-Diamond, test |
Accuracy | 28.79 |
The GPQA value above was produced with letter-scoring over 198 questions and saved in .eval_results/gpqa.yaml.
Hardware and Software
Muse-2-350M is small enough to run on a single modern GPU for comfortable inference. CPU inference is possible with a compatible implementation, but generation speed depends heavily on the runtime.
| Precision | Approximate VRAM class | Notes |
|---|---|---|
| BF16 / FP16 | 2 GB+ | Recommended for GPU inference |
| FP32 | 4 GB+ | Useful for debugging, slower and larger |
| Quantized | Runtime dependent | Requires external quantization tooling |
Recommended packages:
torchsafetensorstokenizers
Data Scope
Muse-2-350M is intended for English-language generation and reasoning. It is not a multilingual model and should not be expected to provide strong performance outside English.
The model may reflect biases, errors, and omissions present in public text data. It does not contain a retrieval system and does not know about events after its offline data snapshot unless those facts are present in the prompt.
Responsibility and Safety
Muse-2-350M is a general-purpose text-generation model. Developers are responsible for testing it in their own use cases and applying appropriate safeguards.
Recommended deployment practices:
- Evaluate the model on task-specific data before use.
- Use human review for high-impact outputs.
- Add content filters or policy layers where needed.
- Avoid using raw model output as authoritative factual advice.
- Monitor for hallucinations, unsafe completions, and prompt sensitivity.
Limitations
- The model can hallucinate facts.
- The model can make arithmetic, science, and coding mistakes.
- The model may be sensitive to prompt wording.
- The model can show answer-choice bias on multiple-choice tasks.
- The model is not a live search or retrieval system.
- The model is not specialized for medical, legal, financial, or safety-critical advice.
- Long-context support does not guarantee perfect long-context reasoning.
Citation
@misc{museresearch2026muse2_350m,
title = {Muse-2-350M},
author = {Muse-research},
year = {2026},
url = {https://huggingface.co/Muse-research/Muse-2-350M}
}
- Downloads last month
- -
Collection including Muse-research/Muse-2-350M
Evaluation results
- Diamond on Idavidrein/gpqa View evaluation results leaderboard