Byte-lingua-code / README.md
2ira's picture
offline_compression_graph_code
72c0672 verified
# Forked Lingua
## Setup
```bash
bash setup/create_env.sh
```
Once that is done your can activate the environment
```bash
source ~/envs/lingua_<date>/bin/activate
```
## Data
```bash
python setup/download_prepare_hf_data.py dclm_baseline_1.0_10prct <mem> --data_dir /mnt/bn/tiktok-mm-5/aiic/users/linzheng/data/dclm_10prct --seed 42 --nchunks <nchunks>
```
```bash
torchrun --nproc-per-node 8 -m apps.evabyte.train config=apps/evabyte/configs/evabyte_7b.yaml
```