YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
DeepSeek-V4-Flash MXFP4 Experts + INT8 Dense
Experimental DeepSeek-V4-Flash MXFP4 experts + INT8 dense checkpoint for Ampere serving in the AppMana vLLM fork.
Source checkpoint: deepseek-ai/DeepSeek-V4-Flash@fd53f944496234770ba80e15004f9b6d269a71f5
Conversion:
CUDA_VISIBLE_DEVICES=1 .venv/bin/python tools/ampere/dsv4_requant_checkpoint.py \
--src /home/administrator/inference/.cache/huggingface/models--deepseek-ai--DeepSeek-V4-Flash/snapshots/fd53f944496234770ba80e15004f9b6d269a71f5 \
--dst /home/administrator/inference/deepseek-v4-flash-dsv4-mxfp4-int8-channel-vllm \
--device cuda:0 \
--dense-int8-strategy channel \
--expert-format mxfp4 \
--num-output-shards 72 \
--overwrite
Expected canary values:
quantization_config.quant_method:dsv4_mxfp4_int8quantization_config.format:mxfp4_int8_packed- Dense linear weights: signed INT8, channelwise scales
- Routed experts: native MXFP4 with native E8M0 scales
expert_dtype:fp4
This artifact is still under active development and should be evaluated for quality before production use.
- Downloads last month
- 15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support