Buckets:

Deepdive404
/

DeepSeek-V4-Flash-bucket

Deepdive404/DeepSeek-V4-Flash-bucket / inference

160 GB

74 files

Updated 7 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
README.md	951 Bytes xet	7 days ago	e43f9389
config.json	991 Bytes xet	7 days ago	2c9dafb0
convert.py	7.08 kB xet	7 days ago	969b192b
generate.py	6.3 kB xet	7 days ago	01b1677a
kernel.py	22.2 kB xet	7 days ago	e353d1ed
model.py	38.6 kB xet	7 days ago	17243ed6
requirements.txt	92 Bytes xet	7 days ago	0acba707

README.md

Inference code for DeepSeek models

First convert huggingface model weight files to the format of this project.

export EXPERTS=256
export MP=4
export CONFIG=config.json
python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}

Then chat with DeepSeek model at will!

torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive

Or batch inference from file.

torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE}

Or multi nodes inference.

torchrun --nnodes ${NODES} --nproc-per-node $((MP / NODES)) --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE}

If you want to use fp8, just remove "expert_dtype": "fp4" in config.json and specify --expert-dtype fp8 in convert.py.

Total size: 160 GB

Files: 74

Last updated: May 22

Pre-warmed CDN: US EU US EU

Inference code for DeepSeek models

Contributors