160 GB
74 files
Updated 7 days ago
NameSize
README.md951 Bytes
xet
config.json991 Bytes
xet
convert.py7.08 kB
xet
generate.py6.3 kB
xet
kernel.py22.2 kB
xet
model.py38.6 kB
xet
requirements.txt92 Bytes
xet
README.md

Inference code for DeepSeek models

First convert huggingface model weight files to the format of this project.

export EXPERTS=256
export MP=4
export CONFIG=config.json
python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}

Then chat with DeepSeek model at will!

torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive

Or batch inference from file.

torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE}

Or multi nodes inference.

torchrun --nnodes ${NODES} --nproc-per-node $((MP / NODES)) --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE}

If you want to use fp8, just remove "expert_dtype": "fp4" in config.json and specify --expert-dtype fp8 in convert.py.

Total size
160 GB
Files
74
Last updated
May 22
Pre-warmed CDN
US EU US EU

Contributors