| # Forked Lingua | |
| ## Setup | |
| ```bash | |
| bash setup/create_env.sh | |
| ``` | |
| Once that is done your can activate the environment | |
| ```bash | |
| source ~/envs/lingua_<date>/bin/activate | |
| ``` | |
| ## Data | |
| ```bash | |
| python setup/download_prepare_hf_data.py dclm_baseline_1.0_10prct <mem> --data_dir /mnt/bn/tiktok-mm-5/aiic/users/linzheng/data/dclm_10prct --seed 42 --nchunks <nchunks> | |
| ``` | |
| ```bash | |
| torchrun --nproc-per-node 8 -m apps.evabyte.train config=apps/evabyte/configs/evabyte_7b.yaml | |
| ``` | |