Byte-lingua-code / README.md
2ira's picture
offline_compression_graph_code
72c0672 verified

Forked Lingua

Setup

bash setup/create_env.sh

Once that is done your can activate the environment

source ~/envs/lingua_<date>/bin/activate

Data

python setup/download_prepare_hf_data.py dclm_baseline_1.0_10prct <mem> --data_dir /mnt/bn/tiktok-mm-5/aiic/users/linzheng/data/dclm_10prct --seed 42 --nchunks <nchunks>
torchrun --nproc-per-node 8 -m apps.evabyte.train config=apps/evabyte/configs/evabyte_7b.yaml