How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Akicou/Quasar-10B-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Quasar-10B-GGUF

GGUF conversion and quantizations of silx-ai/Quasar-10B.

Upstream metadata lists Qwen/Qwen3.5-9B-Base as the base model for silx-ai/Quasar-10B; this repository is a GGUF quantization/conversion of silx-ai/Quasar-10B.

These files were converted/quantized with my llama.cpp fork:

They can also be run with that fork. If upstream llama.cpp does not recognize the model architecture or metadata, use the fork above.

Available files

File Type Size
Quasar-10B-f16.gguf F16 GGUF 17,217,163,520 bytes
Quasar-10B-Q4_K_M.gguf Q4_K_M quantization 5,428,039,936 bytes
Quasar-10B-Q4_K_S.gguf Q4_K_S quantization 5,153,411,328 bytes

Runtime

Example using the forked llama.cpp:

git clone https://github.com/Akicou/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release -j

hf download Akicou/Quasar-10B-GGUF Quasar-10B-Q4_K_M.gguf --local-dir .
./build/bin/llama-cli -m Quasar-10B-Q4_K_M.gguf -p "hi"

Adjust the binary path for your platform/build type if needed.

Important behavior note

While testing, this model did not seem to be fine-tuned for instruction following or normal conversational/chat use. It appears to behave more like a continued-pretraining checkpoint. For example, even when prompted with something simple like hi, it tended to act as if it had been asked to solve a mathematical problem using constraints.

Because of that, do not expect reliable assistant-style instruction following from these files unless you add your own prompting/evaluation setup or fine-tune the model further.

Model card metadata

  • Base model: silx-ai/Quasar-10B
  • Base-model relation: quantized GGUF conversion
  • Upstream base listed by silx-ai/Quasar-10B: Qwen/Qwen3.5-9B-Base
  • License: Apache-2.0, inherited from the upstream model metadata
  • Languages listed upstream: English (en) and Arabic (ar)
  • Pipeline: text generation

GGUF metadata reported by Hugging Face Hub

The Hub detected the following GGUF metadata from the uploaded files:

  • Architecture: quasar
  • Context length: 2,097,152
  • EOS token: <|endoftext|>
  • Chat template: present in GGUF metadata
  • Total file size for the detected F16 GGUF: 17,217,163,520 bytes
  • Repo contains F16 plus Q4_K_M and Q4_K_S GGUF files
Downloads last month
229
GGUF
Model size
9B params
Architecture
quasar
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Akicou/Quasar-10B-GGUF

Quantized
(1)
this model