CyberSecQwen-4B-GGUF

GGUF Q4_K_M quantized version of CyberSecQwen-4B.

Quantization

Parameter Value
Method GGUF Q4_K_M (llama.cpp)
Weight precision 4-bit (Q4_K_M = 4-bit block-scaled with k-quant importance)
Quantization tool llama.cpp (build from master)
Conversion tool convert_hf_to_gguf.py
Quantization hardware Modal A10G
File cybersecqwen-4b-Q4_K_M.gguf (2.5 GB)

CTI-Bench Evaluation

Evaluated under the Foundation-Sec-8B protocol:

  • Temperature 0.3, max_tokens 512, concurrency 8
  • 5 independent trials, zero-shot (no system prompt)
  • llama.cpp server on Modal L4 GPU
Task GGUF Q4_K_M AWQ 4-bit FP16 Reference
CTI-MCQ (2,500 items) 0.5368 ± 0.0048 0.5921 ± 0.0083 0.5868 ± 0.0029
CTI-RCM (1,000 items) 0.6254 ± 0.0063 0.5814 ± 0.0025 0.6664 ± 0.0023

Key findings:

  • CTI-RCM (CVE→CWE classification): GGUF Q4_K_M is the best quantized variant (-4.1 pts vs FP16). Superior to AWQ 4-bit by +4.4 points.
  • CTI-MCQ (CTI knowledge): AWQ 4-bit performs better than GGUF for multiple-choice questions.
  • GGUF preserves task-specific classification accuracy better due to block-wise k-quant importance scaling.

Trial results

CTI-MCQ

Trial Seed Accuracy
1 42 0.5420
2 43 0.5280
3 44 0.5360
4 45 0.5392
5 46 0.5388

CTI-MCQ

Trial Seed Accuracy
1 42 0.6270
2 43 0.6300
3 44 0.6270
4 45 0.6300
5 46 0.6130

Quantization variants

Variant CTI-MCQ CTI-RCM Size Engine
AWQ 4-bit 0.5921 0.5814 2.7 GB vLLM
GGUF Q4_K_M 0.5368 0.6254 2.5 GB llama.cpp

Choose GGUF for vulnerability classification, AWQ for MCQ/general chat.

Usage with llama.cpp

# Download
wget https://huggingface.co/ree2raz/CyberSecQwen-4B-GGUF/resolve/main/cybersecqwen-4b-Q4_K_M.gguf

# Serve
./llama-server -m cybersecqwen-4b-Q4_K_M.gguf --host 0.0.0.0 --port 8080 -ngl 99 -c 4096

Model Size

Format Size
Original FP16 ~8 GB
GGUF Q4_K_M ~2.5 GB

Citation

@misc{{cybersecqwen2026,
  title  = {{CyberSecQwen-4B: A Compact CTI Specialist}},
  author = {{Mulia, Samuel}},
  year   = {{2026}},
  url    = {{https://huggingface.co/athena129/CyberSecQwen-4B}}
}}

Evaluation Infrastructure

GitHub repository — Modal scripts for quantization + evaluation.

Downloads last month
173
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ree2raz/CyberSecQwen-4B-GGUF

Paper for ree2raz/CyberSecQwen-4B-GGUF