YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

DeepSeek-V4-Flash MXFP4 Experts + INT8 Dense

Experimental DeepSeek-V4-Flash MXFP4 experts + INT8 dense checkpoint for Ampere serving in the AppMana vLLM fork.

Source checkpoint: deepseek-ai/DeepSeek-V4-Flash@fd53f944496234770ba80e15004f9b6d269a71f5

Conversion:

CUDA_VISIBLE_DEVICES=1 .venv/bin/python tools/ampere/dsv4_requant_checkpoint.py \
  --src /home/administrator/inference/.cache/huggingface/models--deepseek-ai--DeepSeek-V4-Flash/snapshots/fd53f944496234770ba80e15004f9b6d269a71f5 \
  --dst /home/administrator/inference/deepseek-v4-flash-dsv4-mxfp4-int8-channel-vllm \
  --device cuda:0 \
  --dense-int8-strategy channel \
  --expert-format mxfp4 \
  --num-output-shards 72 \
  --overwrite

Expected canary values:

  • quantization_config.quant_method: dsv4_mxfp4_int8
  • quantization_config.format: mxfp4_int8_packed
  • Dense linear weights: signed INT8, channelwise scales
  • Routed experts: native MXFP4 with native E8M0 scales
  • expert_dtype: fp4

This artifact is still under active development and should be evaluated for quality before production use.

Downloads last month
15
Safetensors
Model size
158B params
Tensor type
BF16
·
F32
·
F8_E8M0
·
I8
·
U8
·
I64
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support