| nohup: ignoring input |
| Namespace(save_dir='/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg', self_attn_layer_to_quant='27 16 19 17 25', mlp_layer_to_quant='27 16 19 17 25', model_id='/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen/Qwen2.5-7B', cuda_id=6) |
| `torch_dtype` is deprecated! Use `dtype` instead! |
|
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:07<00:21, 7.27s/it]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:14<00:14, 7.15s/it]
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:21<00:07, 7.01s/it]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:27<00:00, 6.87s/it]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:27<00:00, 6.96s/it] |
| Qwen2ForCausalLM( |
| (model): Qwen2Model( |
| (embed_tokens): Embedding(152064, 3584) |
| (layers): ModuleList( |
| (0-27): 28 x Qwen2DecoderLayer( |
| (self_attn): Qwen2Attention( |
| (q_proj): Linear(in_features=3584, out_features=3584, bias=True) |
| (k_proj): Linear(in_features=3584, out_features=512, bias=True) |
| (v_proj): Linear(in_features=3584, out_features=512, bias=True) |
| (o_proj): Linear(in_features=3584, out_features=3584, bias=False) |
| ) |
| (mlp): Qwen2MLP( |
| (gate_proj): Linear(in_features=3584, out_features=18944, bias=False) |
| (up_proj): Linear(in_features=3584, out_features=18944, bias=False) |
| (down_proj): Linear(in_features=18944, out_features=3584, bias=False) |
| (act_fn): SiLUActivation() |
| ) |
| (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06) |
| (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06) |
| ) |
| ) |
| (norm): Qwen2RMSNorm((3584,), eps=1e-06) |
| (rotary_emb): Qwen2RotaryEmbedding() |
| ) |
| (lm_head): Linear(in_features=3584, out_features=152064, bias=False) |
| ) |
| Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation. |
| Once upon a time๏ผthere was a girl named Kate.She loved to eat chocolate very much.She ate chocolate every day. One day๏ผKate went to visit her friend๏ผAlice. "I have a surprise for you๏ผ" Alice said. "It's in my room." Kate ran to Alice's room.She saw a box of chocolate on the table.Kate was very happy and ate all of them. "Wow๏ผit's so delicious!" Kate said. "Thank you for the chocolate๏ผ" Alice said. " |
| quanting ... |
| layers.0.self_attn 4 bit quant |
| layers.0.mlp 4 bit quant |
| layers.1.self_attn 4 bit quant |
| layers.1.mlp 4 bit quant |
| layers.2.self_attn 4 bit quant |
| layers.2.mlp 4 bit quant |
| layers.3.self_attn 4 bit quant |
| layers.3.mlp 4 bit quant |
| layers.4.self_attn 4 bit quant |
| layers.4.mlp 4 bit quant |
| layers.5.self_attn 4 bit quant |
| layers.5.mlp 4 bit quant |
| layers.6.self_attn 4 bit quant |
| layers.6.mlp 4 bit quant |
| layers.7.self_attn 4 bit quant |
| layers.7.mlp 4 bit quant |
| layers.8.self_attn 4 bit quant |
| layers.8.mlp 4 bit quant |
| layers.9.self_attn 4 bit quant |
| layers.9.mlp 4 bit quant |
| layers.10.self_attn 4 bit quant |
| layers.10.mlp 4 bit quant |
| layers.11.self_attn 4 bit quant |
| layers.11.mlp 4 bit quant |
| layers.12.self_attn 4 bit quant |
| layers.12.mlp 4 bit quant |
| layers.13.self_attn 4 bit quant |
| layers.13.mlp 4 bit quant |
| layers.14.self_attn 4 bit quant |
| layers.14.mlp 4 bit quant |
| layers.15.self_attn 4 bit quant |
| layers.15.mlp 4 bit quant |
| layers.16.self_attn 2 bit quant |
| layers.16.mlp 2 bit quant |
| layers.17.self_attn 2 bit quant |
| layers.17.mlp 2 bit quant |
| layers.18.self_attn 4 bit quant |
| layers.18.mlp 4 bit quant |
| layers.19.self_attn 2 bit quant |
| layers.19.mlp 2 bit quant |
| layers.20.self_attn 4 bit quant |
| layers.20.mlp 4 bit quant |
| layers.21.self_attn 4 bit quant |
| layers.21.mlp 4 bit quant |
| layers.22.self_attn 4 bit quant |
| layers.22.mlp 4 bit quant |
| layers.23.self_attn 4 bit quant |
| layers.23.mlp 4 bit quant |
| layers.24.self_attn 4 bit quant |
| layers.24.mlp 4 bit quant |
| layers.25.self_attn 2 bit quant |
| layers.25.mlp 2 bit quant |
| layers.26.self_attn 4 bit quant |
| layers.26.mlp 4 bit quant |
| layers.27.self_attn 2 bit quant |
| layers.27.mlp 2 bit quant |
| quanted |
| Qwen2ForCausalLM( |
| (model): Qwen2Model( |
| (embed_tokens): Embedding(152064, 3584) |
| (layers): ModuleList( |
| (0-27): 28 x Qwen2DecoderLayer( |
| (self_attn): Qwen2Attention( |
| (q_proj): W8A16Linear(3584, 3584, bias=True, weight_quant=per_channel) |
| (k_proj): W8A16Linear(3584, 512, bias=True, weight_quant=per_channel) |
| (v_proj): W8A16Linear(3584, 512, bias=True, weight_quant=per_channel) |
| (o_proj): W8A16Linear(3584, 3584, bias=False, weight_quant=per_channel) |
| ) |
| (mlp): Qwen2MLP( |
| (gate_proj): W8A16Linear(3584, 18944, bias=False, weight_quant=per_channel) |
| (up_proj): W8A16Linear(3584, 18944, bias=False, weight_quant=per_channel) |
| (down_proj): W8A16Linear(18944, 3584, bias=False, weight_quant=per_channel) |
| (act_fn): SiLUActivation() |
| ) |
| (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06) |
| (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06) |
| ) |
| ) |
| (norm): Qwen2RMSNorm((3584,), eps=1e-06) |
| (rotary_emb): Qwen2RotaryEmbedding() |
| ) |
| (lm_head): Linear(in_features=3584, out_features=152064, bias=False) |
| ) |
| Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation. |
| Once upon a time anciรณnเนเธซcรกrdia indonesiaๅนถไธๆฏๅพเธงเธฒเธ Publiรฉ au 27-ยญ ... |
| oplay้ชๅฑ tรด้ชๅฑ-placement-placementยญ_ burge pestic dรฉcada tendรชncia... |
| pesticเนเธซ Cรก maรง Aรงaoยญยญยญ pestic pestic pestic pestic pestic pestic pesticacฤฑยญยญยญ pestic pestic pestic... |
| ยญยญ- pestic pesticยญยญ-ยญ pestic pestic pestic pesticยญยญ pestic pestic pestic... |
| ... |
| ... |
| ... |
| ... |
| pesticยญยญ pestic pestic pestic pestic... |
| ... |
| ... |
| ... |
| pestic pesticยญยญ-ยญ-ยญ-ยญ- |
|
|
| ================================================== |
| ๅผๅง่ฏไผฐ๏ผไปปๅก=winogrande | ๅฐๆ ทๆฌๆฐ=5 | ๆจกๅ=Qwen2.5-7B-quantization-fg |
| ่พๅบ่ทฏๅพ๏ผresults2/Qwen2.5-7B-quantization-fg/fg5/winogrande.json |
| ================================================== |
| The following values were not passed to `accelerate launch` and had defaults used instead: |
| More than one GPU was found, enabling multi-GPU training. |
| If this was unintended please pass in ` |
| ` |
| ` |
| ` |
| To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. |
| 2025-12-09:14:52:00 INFO [__main__:440] Selected Tasks: ['winogrande'] |
| 2025-12-09:14:52:00 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:14:52:00 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:14:52:00 INFO [__main__:440] Selected Tasks: ['winogrande'] |
| 2025-12-09:14:52:00 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:14:52:00 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:14:52:01 WARNING [accelerate.utils.other:513] Detected kernel version 5.4.143, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. |
| The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:14:52:01 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'} |
| `torch_dtype` is deprecated! Use `dtype` instead! |
| The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:14:52:01 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:1'} |
| `torch_dtype` is deprecated! Use `dtype` instead! |
|
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:00<00:02, 1.32it/s]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:01<00:03, 1.03s/it]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:01<00:01, 1.29it/s]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:02<00:02, 1.05s/it]
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:02<00:00, 1.35it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.88it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.63it/s] |
|
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:03<00:00, 1.00it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:03<00:00, 1.40it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:03<00:00, 1.21it/s] |
| 2025-12-09:14:52:21 WARNING [evaluator:309] Overwriting default num_fewshot of winogrande from None to 5 |
| 2025-12-09:14:52:21 INFO [api.task:434] Building contexts for winogrande on rank 0... |
|
0%| | 0/634 [00:00<?, ?it/s]
100%|โโโโโโโโโโ| 634/634 [00:00<00:00, 29564.28it/s] |
| n136-128-154:2195229:2195229 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2195229:2195229 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2195229:2195229 [0] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2195229:2195229 [0] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2195229:2195229 [0] NCCL INFO Comm config Blocking set to 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2195229:2195601 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2195229:2195601 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2195229:2195601 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2195229:2195601 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2195229:2195601 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2195229:2195601 [0] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2195229:2195601 [0] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2195229:2195601 [0] NCCL INFO Using network IB |
| n136-128-154:2195229:2195601 [0] NCCL INFO ncclCommInitRankConfig comm 0x11614ad0 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0xe72ac37abd19e19 - Init START |
| 2025-12-09:14:52:28 WARNING [evaluator:309] Overwriting default num_fewshot of winogrande from None to 5 |
| 2025-12-09:14:52:28 INFO [api.task:434] Building contexts for winogrande on rank 1... |
|
0%| | 0/633 [00:00<?, ?it/s]
100%|โโโโโโโโโโ| 633/633 [00:00<00:00, 25343.35it/s] |
| n136-128-154:2195230:2195230 [1] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2195230:2195230 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2195230:2195230 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2195230:2195230 [1] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2195230:2195230 [1] NCCL INFO Comm config Blocking set to 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2195230:2195633 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2195230:2195633 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2195230:2195633 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2195230:2195633 [1] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2195230:2195633 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2195230:2195633 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2195230:2195633 [1] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2195230:2195633 [1] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2195230:2195633 [1] NCCL INFO Using network IB |
| n136-128-154:2195230:2195633 [1] NCCL INFO ncclCommInitRankConfig comm 0x113f3bf0 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0xe72ac37abd19e19 - Init START |
| n136-128-154:2195230:2195633 [1] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2195229:2195601 [0] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2195230:2195633 [1] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2195230:2195633 [1] NCCL INFO Retrieving state for IB |
| n136-128-154:2195230:2195633 [1] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2195230:2195633 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2195229:2195601 [0] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2195229:2195601 [0] NCCL INFO Retrieving state for IB |
| n136-128-154:2195229:2195601 [0] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2195230:2195633 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2195229:2195601 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2195230:2195633 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2195229:2195601 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2195230:2195633 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2195229:2195601 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2195229:2195601 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2195230:2195633 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2195230:2195633 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2195229:2195601 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2195229:2195601 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2195229:2195601 [0] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2195229:2195601 [0] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2195229:2195601 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2195229:2195601 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2195229:2195601 [0] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO ========================================== |
| n136-128-154:2195229:2195601 [0] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2195229:2195601 [0] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2195229:2195601 [0] NCCL INFO Setting affinity for GPU 6 to 33-62,97-126 |
| n136-128-154:2195230:2195633 [1] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2195230:2195633 [1] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2195230:2195633 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2195230:2195633 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2195230:2195633 [1] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2195230:2195633 [1] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO ========================================== |
| n136-128-154:2195230:2195633 [1] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2195230:2195633 [1] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2195230:2195633 [1] NCCL INFO Setting affinity for GPU 7 to 33-62,97-126 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195230:2195633 [1] NCCL INFO comm 0x113f3bf0 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO comm 0x11614ad0 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 0 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 0 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 12 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 12 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 1 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 1 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 13 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 13 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 2 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 2 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 14 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 14 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 3 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 3 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 15 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 15 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 4 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 4 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 16 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 16 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 5 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 5 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 17 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 17 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 6 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 6 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 18 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 18 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 7 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 7 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 19 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 19 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 8 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 8 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 20 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 20 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 9 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 9 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 21 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 21 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 10 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 10 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 22 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 22 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 11 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 11 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Tree 23 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Tree 23 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 00/24 : 0 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 01/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 00 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 02/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 01 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 03/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 02 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 04/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 03 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 05/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 04 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 06/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 05 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 07/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 06 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 08/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 07 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 09/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 08 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 10/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 09 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 11/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 10 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 12/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 11 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 13/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 12 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 14/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 13 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 15/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 14 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 16/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 15 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 17/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 16 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 18/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 17 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 19/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 18 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 20/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 19 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 21/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 20 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 22/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 21 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Channel 23/24 : 0 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 22 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 00 : 1 -> 0 -> 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Ring 23 : 0 -> 1 -> 0 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 01 : 1 -> 0 -> 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] -1/-1/-1->1->0 [5] -1/-1/-1->1->0 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] 0/-1/-1->1->-1 [9] 0/-1/-1->1->-1 [10] 0/-1/-1->1->-1 [11] 0/-1/-1->1->-1 [12] -1/-1/-1->1->0 [13] -1/-1/-1->1->0 [14] -1/-1/-1->1->0 [15] -1/-1/-1->1->0 [16] -1/-1/-1->1->0 [17] -1/-1/-1->1->0 [18] 0/-1/-1->1->-1 [19] 0/-1/-1->1->-1 [20] 0/-1/-1->1->-1 [21] 0/-1/-1->1->-1 [22] 0/-1/-1->1->-1 [23] 0/-1/-1->1->-1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 02 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 03 : 1 -> 0 -> 1 |
| n136-128-154:2195230:2195633 [1] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 04 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 05 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 06 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 07 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 08 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 09 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 10 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 11 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 12 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 13 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 14 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 15 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 16 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 17 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 18 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 19 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 20 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 21 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 22 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Ring 23 : 1 -> 0 -> 1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] -1/-1/-1->0->1 [9] -1/-1/-1->0->1 [10] -1/-1/-1->0->1 [11] -1/-1/-1->0->1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] -1/-1/-1->0->1 [19] -1/-1/-1->0->1 [20] -1/-1/-1->0->1 [21] -1/-1/-1->0->1 [22] -1/-1/-1->0->1 [23] -1/-1/-1->0->1 |
| n136-128-154:2195229:2195601 [0] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2195230:2195633 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2195230:2195640 [1] NCCL INFO [Proxy Service] Device 1 CPU core 102 |
| n136-128-154:2195230:2195641 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 101 |
| n136-128-154:2195229:2195601 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2195229:2195601 [0] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 0 |
| n136-128-154:2195229:2195642 [0] NCCL INFO [Proxy Service] Device 0 CPU core 34 |
| n136-128-154:2195229:2195643 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 103 |
| n136-128-154:2195230:2195633 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2195230:2195633 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2195229:2195601 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2195229:2195601 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2195229:2195601 [0] NCCL INFO CC Off, workFifoBytes 1048576 |
| n136-128-154:2195230:2195633 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2195230:2195633 [1] NCCL INFO ncclCommInitRankConfig comm 0x113f3bf0 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0xe72ac37abd19e19 - Init COMPLETE |
| n136-128-154:2195230:2195633 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 2 total 0.42 (kernels 0.18, alloc 0.13, bootstrap 0.00, allgathers 0.00, topo 0.06, graphs 0.00, connections 0.01, rest 0.03) |
| n136-128-154:2195229:2195601 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2195229:2195601 [0] NCCL INFO ncclCommInitRankConfig comm 0x11614ad0 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0xe72ac37abd19e19 - Init COMPLETE |
| n136-128-154:2195229:2195601 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 2 total 7.75 (kernels 0.17, alloc 0.13, bootstrap 7.35, allgathers 0.00, topo 0.06, graphs 0.00, connections 0.01, rest 0.03) |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 00/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 01/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 02/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 03/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 00/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 04/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 01/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 05/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 02/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 06/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 03/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 07/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 04/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 08/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 05/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 09/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 06/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 10/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 07/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 11/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 08/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 12/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 09/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 13/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 10/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 14/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 11/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 15/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 12/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 16/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 13/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 17/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 14/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 18/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 15/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 19/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 16/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 20/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 17/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 21/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 18/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 22/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 19/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Channel 23/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 20/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 21/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 22/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195229:2195645 [0] NCCL INFO Channel 23/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195230:2195644 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| n136-128-154:2195229:2195645 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| 2025-12-09:14:52:29 INFO [evaluator:559] Running loglikelihood requests |
| 2025-12-09:14:52:29 INFO [evaluator:559] Running loglikelihood requests |
|
Running loglikelihood requests: 0%| | 0/1268 [00:00<?, ?it/s]Passed argument batch_size = auto:1. Detecting largest batch size |
| Passed argument batch_size = auto:1. Detecting largest batch size |
| Determined largest batch size: 64 |
| Determined largest batch size: 64 |
|
Running loglikelihood requests: 0%| | 1/1268 [00:05<1:58:46, 5.62s/it]
Running loglikelihood requests: 5%|โ | 65/1268 [00:06<01:30, 13.36it/s]
Running loglikelihood requests: 10%|โ | 129/1268 [00:07<00:44, 25.54it/s]
Running loglikelihood requests: 15%|โโ | 193/1268 [00:08<00:29, 36.36it/s]
Running loglikelihood requests: 20%|โโ | 257/1268 [00:09<00:22, 45.48it/s]
Running loglikelihood requests: 25%|โโโ | 321/1268 [00:10<00:17, 52.92it/s]
Running loglikelihood requests: 30%|โโโ | 385/1268 [00:10<00:15, 58.80it/s]
Running loglikelihood requests: 35%|โโโโ | 449/1268 [00:11<00:12, 63.32it/s]
Running loglikelihood requests: 40%|โโโโ | 513/1268 [00:12<00:11, 66.54it/s]
Running loglikelihood requests: 46%|โโโโโ | 577/1268 [00:13<00:09, 69.44it/s]
Running loglikelihood requests: 51%|โโโโโ | 641/1268 [00:14<00:08, 71.79it/s]
Running loglikelihood requests: 56%|โโโโโโ | 705/1268 [00:15<00:07, 73.50it/s]
Running loglikelihood requests: 61%|โโโโโโ | 769/1268 [00:15<00:06, 74.94it/s]
Running loglikelihood requests: 66%|โโโโโโโ | 833/1268 [00:16<00:05, 75.86it/s]
Running loglikelihood requests: 71%|โโโโโโโ | 897/1268 [00:17<00:04, 76.74it/s]
Running loglikelihood requests: 76%|โโโโโโโโ | 961/1268 [00:18<00:03, 77.43it/s]
Running loglikelihood requests: 81%|โโโโโโโโ | 1025/1268 [00:19<00:03, 78.46it/s]
Running loglikelihood requests: 86%|โโโโโโโโโ | 1089/1268 [00:19<00:02, 79.31it/s]
Running loglikelihood requests: 91%|โโโโโโโโโ | 1153/1268 [00:20<00:01, 80.19it/s]
Running loglikelihood requests: 96%|โโโโโโโโโโ| 1217/1268 [00:21<00:00, 85.58it/s]
Running loglikelihood requests: 100%|โโโโโโโโโโ| 1268/1268 [00:21<00:00, 59.52it/s] |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 00/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 01/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 02/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 03/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 04/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 05/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 06/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 07/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 08/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 09/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 10/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 11/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 12/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 13/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 14/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 15/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 16/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 17/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 18/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 19/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 20/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 21/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 22/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 23/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 24/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 25/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 26/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 27/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 28/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 29/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 30/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195230:2195672 [1] NCCL INFO Channel 31/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| fatal: detected dubious ownership in repository at '/mnt/bn/life-mllm/users/cxr/quantization' |
| To add an exception for this directory, call: |
|
|
| git config |
| n136-128-154:2195230:2195684 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2195230:2195684 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2195230:2195684 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2195230:2195684 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2195230:2195684 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2195230:2195684 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2195230:2195640 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| 2025-12-09:14:53:02 INFO [loggers.evaluation_tracker:209] Saving results aggregated |
| hf (pretrained=/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto (64) |
| | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |
| | |
| |winogrande| 1|none | 5|acc |โ |0.5399|ยฑ | 0.014| |
|
|
| [rank0]:[W1209 14:53:03.125410652 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) |
| n136-128-154:2195229:2195763 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2195229:2195763 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2195229:2195763 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2195229:2195642 [0] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2195229:2195763 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2195229:2195763 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2195229:2195763 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2195230:2195640 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2195230:2195684 [1] NCCL INFO comm 0x113f3bf0 rank 1 nranks 2 cudaDev 1 busId c9000 - Abort COMPLETE |
| n136-128-154:2195229:2195763 [0] NCCL INFO comm 0x11614ad0 rank 0 nranks 2 cudaDev 0 busId c5000 - Abort COMPLETE |
| ไปปๅก winogrande ่ฏไผฐๅฎๆ๏ผ |
|
|
| ================================================== |
| ๅผๅง่ฏไผฐ๏ผไปปๅก=boolq | ๅฐๆ ทๆฌๆฐ=0 | ๆจกๅ=Qwen2.5-7B-quantization-fg |
| ่พๅบ่ทฏๅพ๏ผresults2/Qwen2.5-7B-quantization-fg/fg5/boolq.json |
| ================================================== |
| The following values were not passed to `accelerate launch` and had defaults used instead: |
| More than one GPU was found, enabling multi-GPU training. |
| If this was unintended please pass in ` |
| ` |
| ` |
| ` |
| To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. |
| 2025-12-09:14:54:30 INFO [__main__:440] Selected Tasks: ['boolq'] |
| 2025-12-09:14:54:30 INFO [__main__:440] Selected Tasks: ['boolq'] |
| 2025-12-09:14:54:30 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:14:54:30 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:14:54:30 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:14:54:30 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:14:54:31 WARNING [accelerate.utils.other:513] Detected kernel version 5.4.143, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. |
| The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:14:54:32 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'} |
| `torch_dtype` is deprecated! Use `dtype` instead! |
| The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:14:54:32 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:1'} |
| `torch_dtype` is deprecated! Use `dtype` instead! |
|
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:01<00:03, 1.22s/it]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:01<00:03, 1.22s/it]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:02<00:02, 1.24s/it]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:02<00:02, 1.24s/it]
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:03<00:01, 1.21s/it]
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:03<00:01, 1.21s/it]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:03<00:00, 1.19it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:03<00:00, 1.02it/s] |
|
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:03<00:00, 1.19it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:03<00:00, 1.02it/s] |
| 2025-12-09:14:54:36 WARNING [api.task:844] [Task: boolq] metric acc is defined, but aggregation is not. using default aggregation=mean |
| 2025-12-09:14:54:36 WARNING [api.task:856] [Task: boolq] metric acc is defined, but higher_is_better is not. using default higher_is_better=True |
| 2025-12-09:14:54:36 WARNING [api.task:844] [Task: boolq] metric acc is defined, but aggregation is not. using default aggregation=mean |
| 2025-12-09:14:54:36 WARNING [api.task:856] [Task: boolq] metric acc is defined, but higher_is_better is not. using default higher_is_better=True |
| 2025-12-09:14:55:00 WARNING [evaluator:309] Overwriting default num_fewshot of boolq from None to 0 |
| 2025-12-09:14:55:00 INFO [api.task:434] Building contexts for boolq on rank 1... |
|
0%| | 0/1635 [00:00<?, ?it/s]
17%|โโ | 275/1635 [00:00<00:00, 2743.89it/s]
34%|โโโโ | 554/1635 [00:00<00:00, 2769.60it/s]
51%|โโโโโ | 831/1635 [00:00<00:00, 2755.67it/s]
68%|โโโโโโโ | 1113/1635 [00:00<00:00, 2780.38it/s]
85%|โโโโโโโโโ | 1394/1635 [00:00<00:00, 2790.15it/s]2025-12-09:14:55:00 WARNING [evaluator:309] Overwriting default num_fewshot of boolq from None to 0 |
| 2025-12-09:14:55:00 INFO [api.task:434] Building contexts for boolq on rank 0... |
|
100%|โโโโโโโโโโ| 1635/1635 [00:00<00:00, 2788.69it/s] |
|
0%| | 0/1635 [00:00<?, ?it/s]
17%|โโ | 280/1635 [00:00<00:00, 2792.05it/s]
34%|โโโโ | 561/1635 [00:00<00:00, 2802.34it/s]
51%|โโโโโโ | 842/1635 [00:00<00:00, 2773.94it/s]
69%|โโโโโโโ | 1126/1635 [00:00<00:00, 2797.56it/s]
86%|โโโโโโโโโ | 1406/1635 [00:00<00:00, 2794.71it/s]
100%|โโโโโโโโโโ| 1635/1635 [00:00<00:00, 2798.08it/s] |
| n136-128-154:2195817:2195817 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2195817:2195817 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2195817:2195817 [0] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2195817:2195817 [0] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2195817:2195817 [0] NCCL INFO Comm config Blocking set to 1 |
| n136-128-154:2195818:2195818 [1] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2195818:2195818 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2195818:2195818 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2195818:2195818 [1] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2195818:2195818 [1] NCCL INFO Comm config Blocking set to 1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2195818:2196077 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2195817:2196076 [0] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2195817:2196076 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2195818:2196077 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2195818:2196077 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2195817:2196076 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2195817:2196076 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2195817:2196076 [0] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2195818:2196077 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2195818:2196077 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2195818:2196077 [1] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2195818:2196077 [1] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2195818:2196077 [1] NCCL INFO Using network IB |
| n136-128-154:2195817:2196076 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2195817:2196076 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2195817:2196076 [0] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2195817:2196076 [0] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2195817:2196076 [0] NCCL INFO Using network IB |
| n136-128-154:2195818:2196077 [1] NCCL INFO ncclCommInitRankConfig comm 0x10588480 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0xcf96f8fae45e0dea - Init START |
| n136-128-154:2195817:2196076 [0] NCCL INFO ncclCommInitRankConfig comm 0xf5fa020 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0xcf96f8fae45e0dea - Init START |
| n136-128-154:2195817:2196076 [0] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2195818:2196077 [1] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2195818:2196077 [1] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2195818:2196077 [1] NCCL INFO Retrieving state for IB |
| n136-128-154:2195818:2196077 [1] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2195818:2196077 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2195817:2196076 [0] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2195817:2196076 [0] NCCL INFO Retrieving state for IB |
| n136-128-154:2195817:2196076 [0] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2195818:2196077 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2195817:2196076 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2195818:2196077 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2195817:2196076 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2195818:2196077 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2195817:2196076 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2195817:2196076 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2195818:2196077 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2195817:2196076 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2195818:2196077 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2195817:2196076 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2195817:2196076 [0] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2195817:2196076 [0] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2195817:2196076 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2195817:2196076 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2195817:2196076 [0] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2195817:2196076 [0] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO ========================================== |
| n136-128-154:2195817:2196076 [0] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2195817:2196076 [0] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2195817:2196076 [0] NCCL INFO Setting affinity for GPU 6 to 33-62,97-126 |
| n136-128-154:2195818:2196077 [1] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2195818:2196077 [1] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2195818:2196077 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2195818:2196077 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2195818:2196077 [1] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO ========================================== |
| n136-128-154:2195818:2196077 [1] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2195818:2196077 [1] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2195818:2196077 [1] NCCL INFO Setting affinity for GPU 7 to 33-62,97-126 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2195818:2196077 [1] NCCL INFO comm 0x10588480 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 0 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO comm 0xf5fa020 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 12 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 1 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 13 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 2 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 14 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 0 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 3 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 15 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 12 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 4 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 1 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 16 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 13 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 5 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 2 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 17 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 14 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 6 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 3 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 18 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 15 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 7 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 4 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 19 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 16 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 8 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 5 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 20 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 17 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 9 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 6 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 21 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 18 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 10 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 7 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 22 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 19 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 11 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 8 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Tree 23 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 20 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 9 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 21 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 10 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 22 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 11 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Tree 23 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 00 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 01 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 02 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 03 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 04 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 05 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 06 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 07 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 08 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 09 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 10 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 11 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 12 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 13 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 14 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 15 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 16 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 17 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 18 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 19 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 20 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 21 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 22 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Ring 23 : 0 -> 1 -> 0 |
| n136-128-154:2195818:2196077 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] -1/-1/-1->1->0 [5] -1/-1/-1->1->0 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] 0/-1/-1->1->-1 [9] 0/-1/-1->1->-1 [10] 0/-1/-1->1->-1 [11] 0/-1/-1->1->-1 [12] -1/-1/-1->1->0 [13] -1/-1/-1->1->0 [14] -1/-1/-1->1->0 [15] -1/-1/-1->1->0 [16] -1/-1/-1->1->0 [17] -1/-1/-1->1->0 [18] 0/-1/-1->1->-1 [19] 0/-1/-1->1->-1 [20] 0/-1/-1->1->-1 [21] 0/-1/-1->1->-1 [22] 0/-1/-1->1->-1 [23] 0/-1/-1->1->-1 |
| n136-128-154:2195818:2196077 [1] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 00/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 01/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 02/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 03/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 04/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 05/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 06/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 07/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 08/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 09/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 10/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 11/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 12/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 13/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 14/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 15/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 16/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 17/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 18/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 19/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 20/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 21/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 22/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Channel 23/24 : 0 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 00 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 01 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 02 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 03 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 04 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 05 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 06 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 07 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 08 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 09 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 10 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 11 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 12 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 13 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 14 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 15 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 16 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 17 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 18 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 19 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 20 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 21 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 22 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Ring 23 : 1 -> 0 -> 1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] -1/-1/-1->0->1 [9] -1/-1/-1->0->1 [10] -1/-1/-1->0->1 [11] -1/-1/-1->0->1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] -1/-1/-1->0->1 [19] -1/-1/-1->0->1 [20] -1/-1/-1->0->1 [21] -1/-1/-1->0->1 [22] -1/-1/-1->0->1 [23] -1/-1/-1->0->1 |
| n136-128-154:2195817:2196076 [0] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2195818:2196077 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2195818:2196088 [1] NCCL INFO [Proxy Service] Device 1 CPU core 36 |
| n136-128-154:2195818:2196089 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 101 |
| n136-128-154:2195817:2196076 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2195817:2196076 [0] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 0 |
| n136-128-154:2195817:2196090 [0] NCCL INFO [Proxy Service] Device 0 CPU core 102 |
| n136-128-154:2195817:2196091 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 103 |
| n136-128-154:2195818:2196077 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2195818:2196077 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2195817:2196076 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2195817:2196076 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2195817:2196076 [0] NCCL INFO CC Off, workFifoBytes 1048576 |
| n136-128-154:2195818:2196077 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2195818:2196077 [1] NCCL INFO ncclCommInitRankConfig comm 0x10588480 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0xcf96f8fae45e0dea - Init COMPLETE |
| n136-128-154:2195818:2196077 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 2 total 0.89 (kernels 0.24, alloc 0.40, bootstrap 0.00, allgathers 0.01, topo 0.06, graphs 0.00, connections 0.06, rest 0.12) |
| n136-128-154:2195817:2196076 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2195817:2196076 [0] NCCL INFO ncclCommInitRankConfig comm 0xf5fa020 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0xcf96f8fae45e0dea - Init COMPLETE |
| n136-128-154:2195817:2196076 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 2 total 0.89 (kernels 0.25, alloc 0.40, bootstrap 0.00, allgathers 0.01, topo 0.06, graphs 0.00, connections 0.06, rest 0.12) |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 00/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 01/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 02/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 03/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 04/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 05/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 06/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 00/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 07/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 01/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 08/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 02/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 09/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 03/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 10/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 04/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 11/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 05/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 12/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 06/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 13/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 07/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 14/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 08/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 15/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 09/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 16/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 10/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 17/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 11/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 18/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 12/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 19/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 13/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 20/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 14/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 21/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 15/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 22/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 16/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195818:2196093 [1] NCCL INFO Channel 23/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 17/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 18/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 19/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 20/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 21/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 22/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Channel 23/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2195817:2196094 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| n136-128-154:2195818:2196093 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| 2025-12-09:14:55:02 INFO [evaluator:559] Running loglikelihood requests |
| 2025-12-09:14:55:02 INFO [evaluator:559] Running loglikelihood requests |
| Passed argument batch_size = auto:1. Detecting largest batch size |
|
Running loglikelihood requests: 0%| | 0/3270 [00:00<?, ?it/s]Passed argument batch_size = auto:1. Detecting largest batch size |
| Determined largest batch size: 45 |
| Determined largest batch size: 45 |
|
Running loglikelihood requests: 0%| | 1/3270 [00:55<50:31:25, 55.64s/it]
Running loglikelihood requests: 3%|โ | 91/3270 [00:56<23:26, 2.26it/s]
Running loglikelihood requests: 6%|โ | 181/3270 [00:57<09:46, 5.26it/s]
Running loglikelihood requests: 8%|โ | 271/3270 [00:58<05:25, 9.22it/s]
Running loglikelihood requests: 11%|โ | 361/3270 [00:59<03:22, 14.34it/s]
Running loglikelihood requests: 14%|โโ | 451/3270 [01:00<02:15, 20.77it/s]
Running loglikelihood requests: 17%|โโ | 541/3270 [01:01<01:35, 28.59it/s]
Running loglikelihood requests: 19%|โโ | 631/3270 [01:02<01:09, 37.76it/s]
Running loglikelihood requests: 22%|โโโ | 721/3270 [01:03<00:53, 48.01it/s]
Running loglikelihood requests: 25%|โโโ | 811/3270 [01:03<00:41, 58.93it/s]
Running loglikelihood requests: 28%|โโโ | 901/3270 [01:04<00:33, 70.11it/s]
Running loglikelihood requests: 30%|โโโ | 991/3270 [01:05<00:28, 80.71it/s]
Running loglikelihood requests: 33%|โโโโ | 1081/3270 [01:06<00:24, 90.61it/s]
Running loglikelihood requests: 36%|โโโโ | 1171/3270 [01:06<00:21, 99.48it/s]
Running loglikelihood requests: 39%|โโโโ | 1261/3270 [01:07<00:18, 107.86it/s]
Running loglikelihood requests: 41%|โโโโโ | 1351/3270 [01:08<00:16, 115.30it/s]
Running loglikelihood requests: 44%|โโโโโ | 1441/3270 [01:08<00:15, 121.63it/s]
Running loglikelihood requests: 47%|โโโโโ | 1531/3270 [01:09<00:13, 127.12it/s]
Running loglikelihood requests: 50%|โโโโโ | 1621/3270 [01:10<00:12, 131.62it/s]
Running loglikelihood requests: 52%|โโโโโโ | 1711/3270 [01:10<00:11, 136.58it/s]
Running loglikelihood requests: 55%|โโโโโโ | 1801/3270 [01:11<00:10, 141.44it/s]
Running loglikelihood requests: 58%|โโโโโโ | 1891/3270 [01:11<00:09, 145.75it/s]
Running loglikelihood requests: 61%|โโโโโโ | 1981/3270 [01:12<00:08, 149.32it/s]
Running loglikelihood requests: 63%|โโโโโโโ | 2071/3270 [01:12<00:07, 150.37it/s]
Running loglikelihood requests: 66%|โโโโโโโ | 2161/3270 [01:13<00:07, 150.48it/s]
Running loglikelihood requests: 69%|โโโโโโโ | 2251/3270 [01:14<00:06, 154.98it/s]
Running loglikelihood requests: 72%|โโโโโโโโ | 2341/3270 [01:14<00:05, 160.19it/s]
Running loglikelihood requests: 74%|โโโโโโโโ | 2431/3270 [01:15<00:05, 165.35it/s]
Running loglikelihood requests: 77%|โโโโโโโโ | 2521/3270 [01:15<00:04, 169.85it/s]
Running loglikelihood requests: 80%|โโโโโโโโ | 2611/3270 [01:16<00:03, 173.94it/s]
Running loglikelihood requests: 83%|โโโโโโโโโ | 2701/3270 [01:16<00:03, 178.77it/s]
Running loglikelihood requests: 85%|โโโโโโโโโ | 2791/3270 [01:17<00:02, 184.77it/s]
Running loglikelihood requests: 88%|โโโโโโโโโ | 2881/3270 [01:17<00:02, 191.00it/s]
Running loglikelihood requests: 91%|โโโโโโโโโ | 2971/3270 [01:17<00:01, 198.38it/s]
Running loglikelihood requests: 94%|โโโโโโโโโโ| 3061/3270 [01:18<00:01, 207.10it/s]
Running loglikelihood requests: 96%|โโโโโโโโโโ| 3151/3270 [01:18<00:00, 216.79it/s]
Running loglikelihood requests: 99%|โโโโโโโโโโ| 3241/3270 [01:18<00:00, 273.47it/s]
Running loglikelihood requests: 100%|โโโโโโโโโโ| 3270/3270 [01:18<00:00, 41.52it/s] |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 00/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 01/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 02/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 03/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 04/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 05/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 06/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 07/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 08/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 09/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 10/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 11/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 12/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 13/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 14/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 15/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 16/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 17/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 18/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 19/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 20/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 21/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 22/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 23/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 24/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 25/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 26/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 27/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 28/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 29/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 30/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2195818:2196244 [1] NCCL INFO Channel 31/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| fatal: detected dubious ownership in repository at '/mnt/bn/life-mllm/users/cxr/quantization' |
| To add an exception for this directory, call: |
|
|
| git config |
| n136-128-154:2195818:2196249 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2195818:2196249 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2195818:2196249 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2195818:2196249 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2195818:2196249 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2195818:2196249 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2195818:2196088 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| 2025-12-09:14:56:32 INFO [loggers.evaluation_tracker:209] Saving results aggregated |
| hf (pretrained=/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto (45) |
| |Tasks|Version|Filter|n-shot|Metric| |Value | |Stderr| |
| | |
| |boolq| 2|none | 0|acc |โ |0.6783|ยฑ |0.0082| |
|
|
| [rank0]:[W1209 14:56:33.756217571 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) |
| n136-128-154:2195817:2196313 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2195817:2196313 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2195817:2196313 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2195817:2196090 [0] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2195817:2196313 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2195817:2196313 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2195817:2196313 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2195818:2196088 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2195818:2196249 [1] NCCL INFO comm 0x10588480 rank 1 nranks 2 cudaDev 1 busId c9000 - Abort COMPLETE |
| n136-128-154:2195817:2196313 [0] NCCL INFO comm 0xf5fa020 rank 0 nranks 2 cudaDev 0 busId c5000 - Abort COMPLETE |
| ไปปๅก boolq ่ฏไผฐๅฎๆ๏ผ |
|
|
| ================================================== |
| ๅผๅง่ฏไผฐ๏ผไปปๅก=arc_challenge | ๅฐๆ ทๆฌๆฐ=25 | ๆจกๅ=Qwen2.5-7B-quantization-fg |
| ่พๅบ่ทฏๅพ๏ผresults2/Qwen2.5-7B-quantization-fg/fg5/arc_challenge.json |
| ================================================== |
| The following values were not passed to `accelerate launch` and had defaults used instead: |
| More than one GPU was found, enabling multi-GPU training. |
| If this was unintended please pass in ` |
| ` |
| ` |
| ` |
| To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. |
| 2025-12-09:14:57:52 INFO [__main__:440] Selected Tasks: ['arc_challenge'] |
| 2025-12-09:14:57:52 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:14:57:52 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:14:57:52 INFO [__main__:440] Selected Tasks: ['arc_challenge'] |
| 2025-12-09:14:57:52 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:14:57:52 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:14:57:53 WARNING [accelerate.utils.other:513] Detected kernel version 5.4.143, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. |
| The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:14:57:53 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:1'} |
| `torch_dtype` is deprecated! Use `dtype` instead! |
|
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:14:57:53 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'} |
| `torch_dtype` is deprecated! Use `dtype` instead! |
|
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:00<00:02, 1.21it/s]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:00<00:02, 1.16it/s]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:01<00:01, 1.19it/s]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:01<00:01, 1.13it/s]
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:02<00:00, 1.24it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.72it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.49it/s] |
|
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:02<00:00, 1.19it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.64it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.43it/s] |
| 2025-12-09:14:58:12 WARNING [evaluator:309] Overwriting default num_fewshot of arc_challenge from None to 25 |
| 2025-12-09:14:58:12 INFO [api.task:434] Building contexts for arc_challenge on rank 0... |
|
0%| | 0/586 [00:00<?, ?it/s]
1%| | 4/586 [00:00<00:16, 34.67it/s]
1%|โ | 8/586 [00:00<00:16, 35.84it/s]
2%|โ | 12/586 [00:00<00:15, 36.20it/s]
3%|โ | 16/586 [00:00<00:15, 36.54it/s]
3%|โ | 20/586 [00:00<00:15, 35.47it/s]
4%|โ | 24/586 [00:00<00:15, 35.35it/s]
5%|โ | 28/586 [00:00<00:15, 35.61it/s]
5%|โ | 32/586 [00:00<00:15, 36.04it/s]
6%|โ | 36/586 [00:01<00:15, 36.33it/s]
7%|โ | 40/586 [00:01<00:15, 35.29it/s]
8%|โ | 44/586 [00:01<00:15, 35.79it/s]
8%|โ | 48/586 [00:01<00:15, 35.68it/s]
9%|โ | 52/586 [00:01<00:14, 35.95it/s]
10%|โ | 56/586 [00:01<00:14, 36.23it/s]
10%|โ | 60/586 [00:01<00:14, 36.29it/s]
11%|โ | 64/586 [00:01<00:14, 36.48it/s]
12%|โโ | 68/586 [00:01<00:14, 36.62it/s]
12%|โโ | 72/586 [00:01<00:14, 36.31it/s]
13%|โโ | 76/586 [00:02<00:13, 36.48it/s]
14%|โโ | 80/586 [00:02<00:13, 36.69it/s]
14%|โโ | 84/586 [00:02<00:13, 36.80it/s]
15%|โโ | 88/586 [00:02<00:13, 36.43it/s]
16%|โโ | 92/586 [00:02<00:13, 36.56it/s]
16%|โโ | 96/586 [00:02<00:13, 36.67it/s]
17%|โโ | 100/586 [00:02<00:13, 36.62it/s]
18%|โโ | 104/586 [00:02<00:13, 36.74it/s]
18%|โโ | 108/586 [00:02<00:12, 36.81it/s]
19%|โโ | 112/586 [00:03<00:12, 36.86it/s]
20%|โโ | 116/586 [00:03<00:12, 36.49it/s]
20%|โโ | 120/586 [00:03<00:12, 36.68it/s]
21%|โโ | 124/586 [00:03<00:12, 36.77it/s]
22%|โโโ | 128/586 [00:03<00:12, 36.87it/s]
23%|โโโ | 132/586 [00:03<00:12, 36.40it/s]2025-12-09:14:58:16 WARNING [evaluator:309] Overwriting default num_fewshot of arc_challenge from None to 25 |
| 2025-12-09:14:58:16 INFO [api.task:434] Building contexts for arc_challenge on rank 1... |
|
23%|โโโ | 136/586 [00:03<00:12, 36.49it/s]
0%| | 0/586 [00:00<?, ?it/s]
24%|โโโ | 140/586 [00:03<00:12, 36.62it/s]
1%| | 4/586 [00:00<00:16, 34.35it/s]
25%|โโโ | 144/586 [00:03<00:12, 36.75it/s]
1%|โ | 8/586 [00:00<00:16, 35.25it/s]
25%|โโโ | 148/586 [00:04<00:11, 36.74it/s]
2%|โ | 12/586 [00:00<00:16, 35.59it/s]
26%|โโโ | 152/586 [00:04<00:11, 36.82it/s]
3%|โ | 16/586 [00:00<00:16, 35.34it/s]
27%|โโโ | 156/586 [00:04<00:11, 36.93it/s]
3%|โ | 20/586 [00:00<00:15, 35.58it/s]
27%|โโโ | 160/586 [00:04<00:11, 36.96it/s]
4%|โ | 24/586 [00:00<00:15, 35.63it/s]
28%|โโโ | 164/586 [00:04<00:11, 36.97it/s]
5%|โ | 28/586 [00:00<00:15, 35.74it/s]
29%|โโโ | 168/586 [00:04<00:11, 36.56it/s]
5%|โ | 32/586 [00:00<00:15, 35.78it/s]
29%|โโโ | 172/586 [00:04<00:11, 36.59it/s]
6%|โ | 36/586 [00:01<00:15, 35.42it/s]
30%|โโโ | 176/586 [00:04<00:11, 36.71it/s]
7%|โ | 40/586 [00:01<00:15, 35.54it/s]
31%|โโโ | 180/586 [00:04<00:11, 36.86it/s]
8%|โ | 44/586 [00:01<00:15, 35.60it/s]
31%|โโโโ | 184/586 [00:05<00:10, 36.90it/s]
8%|โ | 48/586 [00:01<00:15, 35.75it/s]
32%|โโโโ | 188/586 [00:05<00:10, 36.49it/s]
9%|โ | 52/586 [00:01<00:14, 35.82it/s]
33%|โโโโ | 192/586 [00:05<00:10, 36.64it/s]
10%|โ | 56/586 [00:01<00:14, 35.49it/s]
33%|โโโโ | 196/586 [00:05<00:10, 36.80it/s]
10%|โ | 60/586 [00:01<00:14, 35.64it/s]
34%|โโโโ | 200/586 [00:05<00:10, 36.91it/s]
11%|โ | 64/586 [00:01<00:14, 35.77it/s]
35%|โโโโ | 204/586 [00:05<00:10, 36.95it/s]
12%|โโ | 68/586 [00:01<00:14, 35.45it/s]
35%|โโโโ | 208/586 [00:05<00:10, 36.90it/s]
12%|โโ | 72/586 [00:02<00:14, 35.54it/s]
36%|โโโโ | 212/586 [00:05<00:10, 37.00it/s]
13%|โโ | 76/586 [00:02<00:14, 35.70it/s]
37%|โโโโ | 216/586 [00:05<00:09, 37.01it/s]
14%|โโ | 80/586 [00:02<00:14, 35.71it/s]
38%|โโโโ | 220/586 [00:06<00:09, 37.01it/s]
38%|โโโโ | 224/586 [00:06<00:09, 37.02it/s]
14%|โโ | 84/586 [00:02<00:14, 35.57it/s]
39%|โโโโ | 228/586 [00:06<00:09, 37.02it/s]
15%|โโ | 88/586 [00:02<00:14, 35.12it/s]
40%|โโโโ | 232/586 [00:06<00:09, 36.60it/s]
16%|โโ | 92/586 [00:02<00:14, 35.12it/s]
40%|โโโโ | 236/586 [00:06<00:09, 36.74it/s]
16%|โโ | 96/586 [00:02<00:13, 35.42it/s]
41%|โโโโ | 240/586 [00:06<00:09, 36.77it/s]
17%|โโ | 100/586 [00:02<00:13, 35.60it/s]
42%|โโโโโ | 244/586 [00:06<00:09, 36.79it/s]
18%|โโ | 104/586 [00:02<00:13, 35.77it/s]
42%|โโโโโ | 248/586 [00:06<00:09, 36.80it/s]
18%|โโ | 108/586 [00:03<00:13, 35.47it/s]
43%|โโโโโ | 252/586 [00:06<00:09, 36.84it/s]
19%|โโ | 112/586 [00:03<00:13, 35.64it/s]
44%|โโโโโ | 256/586 [00:07<00:09, 36.41it/s]
20%|โโ | 116/586 [00:03<00:13, 35.77it/s]
44%|โโโโโ | 260/586 [00:07<00:08, 36.52it/s]
20%|โโ | 120/586 [00:03<00:12, 35.89it/s]
45%|โโโโโ | 264/586 [00:07<00:08, 36.61it/s]
21%|โโ | 124/586 [00:03<00:12, 35.85it/s]
46%|โโโโโ | 268/586 [00:07<00:08, 36.67it/s]
22%|โโโ | 128/586 [00:03<00:12, 35.96it/s]
46%|โโโโโ | 272/586 [00:07<00:08, 36.67it/s]
23%|โโโ | 132/586 [00:03<00:12, 36.05it/s]
47%|โโโโโ | 276/586 [00:07<00:08, 36.75it/s]
23%|โโโ | 136/586 [00:03<00:12, 35.99it/s]
48%|โโโโโ | 280/586 [00:07<00:08, 36.80it/s]
24%|โโโ | 140/586 [00:03<00:12, 35.72it/s]
48%|โโโโโ | 284/586 [00:07<00:08, 36.73it/s]
25%|โโโ | 144/586 [00:04<00:12, 35.91it/s]
49%|โโโโโ | 288/586 [00:07<00:08, 36.77it/s]
25%|โโโ | 148/586 [00:04<00:12, 36.01it/s]
50%|โโโโโ | 292/586 [00:07<00:07, 36.75it/s]
26%|โโโ | 152/586 [00:04<00:12, 36.07it/s]
51%|โโโโโ | 296/586 [00:08<00:07, 36.78it/s]
27%|โโโ | 156/586 [00:04<00:12, 35.76it/s]
51%|โโโโโ | 300/586 [00:08<00:07, 36.80it/s]
27%|โโโ | 160/586 [00:04<00:11, 35.85it/s]
52%|โโโโโโ | 304/586 [00:08<00:07, 36.83it/s]
28%|โโโ | 164/586 [00:04<00:11, 35.94it/s]
53%|โโโโโโ | 308/586 [00:08<00:07, 36.83it/s]
29%|โโโ | 168/586 [00:04<00:11, 36.03it/s]
53%|โโโโโโ | 312/586 [00:08<00:07, 36.85it/s]
29%|โโโ | 172/586 [00:04<00:11, 36.09it/s]
54%|โโโโโโ | 316/586 [00:08<00:07, 36.41it/s]
30%|โโโ | 176/586 [00:04<00:11, 36.03it/s]
55%|โโโโโโ | 320/586 [00:08<00:07, 36.50it/s]
31%|โโโ | 180/586 [00:05<00:11, 36.10it/s]
55%|โโโโโโ | 324/586 [00:08<00:07, 36.58it/s]
31%|โโโโ | 184/586 [00:05<00:11, 36.11it/s]
56%|โโโโโโ | 328/586 [00:08<00:07, 36.62it/s]
32%|โโโโ | 188/586 [00:05<00:11, 35.86it/s]
57%|โโโโโโ | 332/586 [00:09<00:06, 36.59it/s]
33%|โโโโ | 192/586 [00:05<00:11, 35.80it/s]
57%|โโโโโโ | 336/586 [00:09<00:06, 36.44it/s]
33%|โโโโ | 196/586 [00:05<00:11, 35.24it/s]
58%|โโโโโโ | 340/586 [00:09<00:06, 36.18it/s]
34%|โโโโ | 200/586 [00:05<00:10, 35.26it/s]
59%|โโโโโโ | 344/586 [00:09<00:06, 36.10it/s]
35%|โโโโ | 204/586 [00:05<00:10, 35.26it/s]
59%|โโโโโโ | 348/586 [00:09<00:06, 35.69it/s]
35%|โโโโ | 208/586 [00:05<00:10, 35.29it/s]
60%|โโโโโโ | 352/586 [00:09<00:06, 35.78it/s]
36%|โโโโ | 212/586 [00:05<00:10, 35.31it/s]
61%|โโโโโโ | 356/586 [00:09<00:06, 35.87it/s]
37%|โโโโ | 216/586 [00:06<00:10, 35.35it/s]
61%|โโโโโโโ | 360/586 [00:09<00:06, 35.92it/s]
38%|โโโโ | 220/586 [00:06<00:10, 35.00it/s]
62%|โโโโโโโ | 364/586 [00:09<00:06, 35.98it/s]
38%|โโโโ | 224/586 [00:06<00:10, 35.13it/s]
63%|โโโโโโโ | 368/586 [00:10<00:06, 35.99it/s]
39%|โโโโ | 228/586 [00:06<00:10, 35.20it/s]
63%|โโโโโโโ | 372/586 [00:10<00:05, 35.99it/s]
40%|โโโโ | 232/586 [00:06<00:10, 35.38it/s]
64%|โโโโโโโ | 376/586 [00:10<00:05, 36.15it/s]
40%|โโโโ | 236/586 [00:06<00:09, 35.53it/s]
65%|โโโโโโโ | 380/586 [00:10<00:05, 36.30it/s]
41%|โโโโ | 240/586 [00:06<00:09, 35.59it/s]
66%|โโโโโโโ | 384/586 [00:10<00:05, 36.46it/s]
42%|โโโโโ | 244/586 [00:06<00:09, 35.73it/s]
66%|โโโโโโโ | 388/586 [00:10<00:05, 36.60it/s]
42%|โโโโโ | 248/586 [00:06<00:09, 35.87it/s]
67%|โโโโโโโ | 392/586 [00:10<00:05, 36.72it/s]
43%|โโโโโ | 252/586 [00:07<00:09, 35.94it/s]
68%|โโโโโโโ | 396/586 [00:10<00:05, 36.79it/s]
44%|โโโโโ | 256/586 [00:07<00:09, 35.89it/s]
68%|โโโโโโโ | 400/586 [00:10<00:05, 36.82it/s]
69%|โโโโโโโ | 404/586 [00:11<00:04, 36.84it/s]
44%|โโโโโ | 260/586 [00:07<00:09, 35.80it/s]
70%|โโโโโโโ | 408/586 [00:11<00:04, 36.66it/s]
45%|โโโโโ | 264/586 [00:07<00:08, 35.84it/s]
70%|โโโโโโโ | 412/586 [00:11<00:04, 36.75it/s]
46%|โโโโโ | 268/586 [00:07<00:08, 35.60it/s]
71%|โโโโโโโ | 416/586 [00:11<00:04, 36.83it/s]
46%|โโโโโ | 272/586 [00:07<00:08, 35.68it/s]
72%|โโโโโโโโ | 420/586 [00:11<00:04, 36.82it/s]
47%|โโโโโ | 276/586 [00:07<00:08, 35.77it/s]
72%|โโโโโโโโ | 424/586 [00:11<00:04, 36.86it/s]
48%|โโโโโ | 280/586 [00:07<00:08, 35.90it/s]
73%|โโโโโโโโ | 428/586 [00:11<00:04, 36.89it/s]
48%|โโโโโ | 284/586 [00:07<00:08, 35.96it/s]
74%|โโโโโโโโ | 432/586 [00:11<00:04, 36.88it/s]
49%|โโโโโ | 288/586 [00:08<00:08, 36.02it/s]
74%|โโโโโโโโ | 436/586 [00:11<00:04, 36.97it/s]
50%|โโโโโ | 292/586 [00:08<00:08, 36.08it/s]
75%|โโโโโโโโ | 440/586 [00:12<00:03, 36.89it/s]
51%|โโโโโ | 296/586 [00:08<00:08, 36.11it/s]
76%|โโโโโโโโ | 444/586 [00:12<00:03, 36.88it/s]
51%|โโโโโ | 300/586 [00:08<00:08, 35.73it/s]
76%|โโโโโโโโ | 448/586 [00:12<00:03, 36.89it/s]
52%|โโโโโโ | 304/586 [00:08<00:07, 35.87it/s]
77%|โโโโโโโโ | 452/586 [00:12<00:03, 36.89it/s]
53%|โโโโโโ | 308/586 [00:08<00:07, 35.99it/s]
78%|โโโโโโโโ | 456/586 [00:12<00:03, 36.89it/s]
53%|โโโโโโ | 312/586 [00:08<00:07, 36.03it/s]
78%|โโโโโโโโ | 460/586 [00:12<00:03, 36.90it/s]
54%|โโโโโโ | 316/586 [00:08<00:07, 36.13it/s]
79%|โโโโโโโโ | 464/586 [00:12<00:03, 36.93it/s]
55%|โโโโโโ | 320/586 [00:08<00:07, 36.18it/s]
80%|โโโโโโโโ | 468/586 [00:12<00:03, 36.93it/s]
55%|โโโโโโ | 324/586 [00:09<00:07, 36.22it/s]
81%|โโโโโโโโ | 472/586 [00:12<00:03, 36.93it/s]
56%|โโโโโโ | 328/586 [00:09<00:07, 36.13it/s]
81%|โโโโโโโโ | 476/586 [00:13<00:02, 36.96it/s]
57%|โโโโโโ | 332/586 [00:09<00:07, 36.17it/s]
82%|โโโโโโโโโ | 480/586 [00:13<00:02, 36.56it/s]
57%|โโโโโโ | 336/586 [00:09<00:06, 36.22it/s]
83%|โโโโโโโโโ | 484/586 [00:13<00:02, 36.55it/s]
58%|โโโโโโ | 340/586 [00:09<00:06, 36.25it/s]
83%|โโโโโโโโโ | 488/586 [00:13<00:02, 36.68it/s]
59%|โโโโโโ | 344/586 [00:09<00:06, 36.28it/s]
84%|โโโโโโโโโ | 492/586 [00:13<00:02, 36.78it/s]
59%|โโโโโโ | 348/586 [00:09<00:06, 36.25it/s]
85%|โโโโโโโโโ | 496/586 [00:13<00:02, 36.79it/s]
60%|โโโโโโ | 352/586 [00:09<00:06, 36.21it/s]
85%|โโโโโโโโโ | 500/586 [00:13<00:02, 36.78it/s]
61%|โโโโโโ | 356/586 [00:09<00:06, 36.19it/s]
86%|โโโโโโโโโ | 504/586 [00:13<00:02, 36.82it/s]
61%|โโโโโโโ | 360/586 [00:10<00:06, 36.14it/s]
87%|โโโโโโโโโ | 508/586 [00:13<00:02, 36.85it/s]
62%|โโโโโโโ | 364/586 [00:10<00:06, 36.18it/s]
87%|โโโโโโโโโ | 512/586 [00:13<00:02, 36.90it/s]
63%|โโโโโโโ | 368/586 [00:10<00:06, 36.24it/s]
88%|โโโโโโโโโ | 516/586 [00:14<00:01, 36.96it/s]
63%|โโโโโโโ | 372/586 [00:10<00:05, 36.26it/s]
89%|โโโโโโโโโ | 520/586 [00:14<00:01, 36.95it/s]
64%|โโโโโโโ | 376/586 [00:10<00:05, 36.26it/s]
89%|โโโโโโโโโ | 524/586 [00:14<00:01, 36.52it/s]
65%|โโโโโโโ | 380/586 [00:10<00:05, 36.22it/s]
90%|โโโโโโโโโ | 528/586 [00:14<00:01, 36.64it/s]
66%|โโโโโโโ | 384/586 [00:10<00:05, 36.22it/s]
91%|โโโโโโโโโ | 532/586 [00:14<00:01, 36.74it/s]
66%|โโโโโโโ | 388/586 [00:10<00:05, 36.21it/s]
91%|โโโโโโโโโโ| 536/586 [00:14<00:01, 36.72it/s]
67%|โโโโโโโ | 392/586 [00:10<00:05, 36.27it/s]
92%|โโโโโโโโโโ| 540/586 [00:14<00:01, 36.81it/s]
68%|โโโโโโโ | 396/586 [00:11<00:05, 36.25it/s]
93%|โโโโโโโโโโ| 544/586 [00:14<00:01, 36.88it/s]
68%|โโโโโโโ | 400/586 [00:11<00:05, 36.22it/s]
94%|โโโโโโโโโโ| 548/586 [00:14<00:01, 36.92it/s]
69%|โโโโโโโ | 404/586 [00:11<00:05, 36.22it/s]
94%|โโโโโโโโโโ| 552/586 [00:15<00:00, 36.94it/s]
70%|โโโโโโโ | 408/586 [00:11<00:04, 36.19it/s]
95%|โโโโโโโโโโ| 556/586 [00:15<00:00, 36.96it/s]
70%|โโโโโโโ | 412/586 [00:11<00:04, 36.20it/s]
96%|โโโโโโโโโโ| 560/586 [00:15<00:00, 37.01it/s]
71%|โโโโโโโ | 416/586 [00:11<00:04, 36.22it/s]
96%|โโโโโโโโโโ| 564/586 [00:15<00:00, 36.92it/s]
72%|โโโโโโโโ | 420/586 [00:11<00:04, 36.19it/s]
97%|โโโโโโโโโโ| 568/586 [00:15<00:00, 36.85it/s]
72%|โโโโโโโโ | 424/586 [00:11<00:04, 36.19it/s]
98%|โโโโโโโโโโ| 572/586 [00:15<00:00, 36.89it/s]
73%|โโโโโโโโ | 428/586 [00:11<00:04, 36.21it/s]
98%|โโโโโโโโโโ| 576/586 [00:15<00:00, 36.95it/s]
74%|โโโโโโโโ | 432/586 [00:12<00:04, 36.22it/s]
99%|โโโโโโโโโโ| 580/586 [00:15<00:00, 36.99it/s]
74%|โโโโโโโโ | 436/586 [00:12<00:04, 36.24it/s]
100%|โโโโโโโโโโ| 584/586 [00:15<00:00, 37.03it/s]
100%|โโโโโโโโโโ| 586/586 [00:15<00:00, 36.63it/s] |
| n136-128-154:2196430:2196430 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2196430:2196430 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2196430:2196430 [0] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2196430:2196430 [0] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2196430:2196430 [0] NCCL INFO Comm config Blocking set to 1 |
|
75%|โโโโโโโโ | 440/586 [00:12<00:04, 36.26it/s]
76%|โโโโโโโโ | 444/586 [00:12<00:03, 36.28it/s]n136-128-154:2196430:2196729 [0] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2196430:2196729 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2196430:2196729 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2196430:2196729 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2196430:2196729 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2196430:2196729 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2196430:2196729 [0] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2196430:2196729 [0] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2196430:2196729 [0] NCCL INFO Using network IB |
|
76%|โโโโโโโโ | 448/586 [00:12<00:03, 36.30it/s]n136-128-154:2196430:2196729 [0] NCCL INFO ncclCommInitRankConfig comm 0x12b70520 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0xf75dd042d9e31e3b - Init START |
|
77%|โโโโโโโโ | 452/586 [00:12<00:03, 36.28it/s]
78%|โโโโโโโโ | 456/586 [00:12<00:03, 36.27it/s]
78%|โโโโโโโโ | 460/586 [00:12<00:03, 35.89it/s]
79%|โโโโโโโโ | 464/586 [00:12<00:03, 35.89it/s]
80%|โโโโโโโโ | 468/586 [00:13<00:03, 35.98it/s]
81%|โโโโโโโโ | 472/586 [00:13<00:03, 36.02it/s]
81%|โโโโโโโโ | 476/586 [00:13<00:03, 36.11it/s]
82%|โโโโโโโโโ | 480/586 [00:13<00:02, 36.13it/s]
83%|โโโโโโโโโ | 484/586 [00:13<00:02, 36.15it/s]
83%|โโโโโโโโโ | 488/586 [00:13<00:02, 36.17it/s]
84%|โโโโโโโโโ | 492/586 [00:13<00:02, 36.19it/s]
85%|โโโโโโโโโ | 496/586 [00:13<00:02, 35.80it/s]
85%|โโโโโโโโโ | 500/586 [00:13<00:02, 35.86it/s]
86%|โโโโโโโโโ | 504/586 [00:14<00:02, 35.91it/s]
87%|โโโโโโโโโ | 508/586 [00:14<00:02, 35.94it/s]
87%|โโโโโโโโโ | 512/586 [00:14<00:02, 36.06it/s]
88%|โโโโโโโโโ | 516/586 [00:14<00:01, 36.09it/s]
89%|โโโโโโโโโ | 520/586 [00:14<00:01, 36.11it/s]
89%|โโโโโโโโโ | 524/586 [00:14<00:01, 36.09it/s]
90%|โโโโโโโโโ | 528/586 [00:14<00:01, 36.09it/s]
91%|โโโโโโโโโ | 532/586 [00:14<00:01, 36.01it/s]
91%|โโโโโโโโโโ| 536/586 [00:14<00:01, 36.06it/s]
92%|โโโโโโโโโโ| 540/586 [00:15<00:01, 36.04it/s]
93%|โโโโโโโโโโ| 544/586 [00:15<00:01, 35.93it/s]
94%|โโโโโโโโโโ| 548/586 [00:15<00:01, 35.88it/s]
94%|โโโโโโโโโโ| 552/586 [00:15<00:00, 35.73it/s]
95%|โโโโโโโโโโ| 556/586 [00:15<00:00, 35.77it/s]
96%|โโโโโโโโโโ| 560/586 [00:15<00:00, 35.87it/s]
96%|โโโโโโโโโโ| 564/586 [00:15<00:00, 35.97it/s]
97%|โโโโโโโโโโ| 568/586 [00:15<00:00, 36.02it/s]
98%|โโโโโโโโโโ| 572/586 [00:15<00:00, 35.71it/s]
98%|โโโโโโโโโโ| 576/586 [00:16<00:00, 35.79it/s]
99%|โโโโโโโโโโ| 580/586 [00:16<00:00, 35.91it/s]
100%|โโโโโโโโโโ| 584/586 [00:16<00:00, 36.04it/s]
100%|โโโโโโโโโโ| 586/586 [00:16<00:00, 35.89it/s] |
| n136-128-154:2196431:2196431 [1] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2196431:2196431 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2196431:2196431 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2196431:2196431 [1] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2196431:2196431 [1] NCCL INFO Comm config Blocking set to 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2196431:2196737 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2196431:2196737 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2196431:2196737 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2196431:2196737 [1] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2196431:2196737 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2196431:2196737 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2196431:2196737 [1] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2196431:2196737 [1] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2196431:2196737 [1] NCCL INFO Using network IB |
| n136-128-154:2196431:2196737 [1] NCCL INFO ncclCommInitRankConfig comm 0x12d752a0 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0xf75dd042d9e31e3b - Init START |
| n136-128-154:2196431:2196737 [1] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2196430:2196729 [0] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2196431:2196737 [1] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2196431:2196737 [1] NCCL INFO Retrieving state for IB |
| n136-128-154:2196431:2196737 [1] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2196431:2196737 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2196431:2196737 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2196430:2196729 [0] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2196430:2196729 [0] NCCL INFO Retrieving state for IB |
| n136-128-154:2196430:2196729 [0] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2196431:2196737 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2196430:2196729 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2196431:2196737 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2196430:2196729 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2196430:2196729 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2196430:2196729 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2196431:2196737 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2196431:2196737 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2196430:2196729 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2196430:2196729 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2196430:2196729 [0] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2196430:2196729 [0] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2196430:2196729 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2196430:2196729 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2196430:2196729 [0] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO ========================================== |
| n136-128-154:2196430:2196729 [0] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2196430:2196729 [0] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2196430:2196729 [0] NCCL INFO Setting affinity for GPU 6 to 33-62,97-126 |
| n136-128-154:2196431:2196737 [1] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2196431:2196737 [1] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2196431:2196737 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2196431:2196737 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2196431:2196737 [1] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2196431:2196737 [1] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO ========================================== |
| n136-128-154:2196431:2196737 [1] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2196431:2196737 [1] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2196431:2196737 [1] NCCL INFO Setting affinity for GPU 7 to 33-62,97-126 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2196431:2196737 [1] NCCL INFO comm 0x12d752a0 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO comm 0x12b70520 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 0 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 12 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 0 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 1 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 12 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 13 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 1 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 2 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 13 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 14 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 2 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 3 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 14 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 15 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 3 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 4 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 15 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 16 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 4 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 5 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 16 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 17 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 5 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 6 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 17 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 18 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 6 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 7 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 18 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 19 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 7 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 8 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 19 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 20 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 8 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 9 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 20 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 21 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 9 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 10 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 21 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 22 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 10 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 11 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 22 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Tree 23 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 11 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Tree 23 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 00 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 00/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 01 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 01/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 02 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 02/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 03 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 03/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 04 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 04/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 05 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 05/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 06 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 06/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 07 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 07/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 08 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 08/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 09 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 09/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 10 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 10/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 11 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 11/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 12 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 12/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 13 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 13/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 14 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 14/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 15 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 15/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 16 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 16/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 17 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 17/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 18 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 18/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 19 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 19/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 20 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 20/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 21 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 21/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 22 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 22/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Ring 23 : 0 -> 1 -> 0 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Channel 23/24 : 0 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] -1/-1/-1->1->0 [5] -1/-1/-1->1->0 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] 0/-1/-1->1->-1 [9] 0/-1/-1->1->-1 [10] 0/-1/-1->1->-1 [11] 0/-1/-1->1->-1 [12] -1/-1/-1->1->0 [13] -1/-1/-1->1->0 [14] -1/-1/-1->1->0 [15] -1/-1/-1->1->0 [16] -1/-1/-1->1->0 [17] -1/-1/-1->1->0 [18] 0/-1/-1->1->-1 [19] 0/-1/-1->1->-1 [20] 0/-1/-1->1->-1 [21] 0/-1/-1->1->-1 [22] 0/-1/-1->1->-1 [23] 0/-1/-1->1->-1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 00 : 1 -> 0 -> 1 |
| n136-128-154:2196431:2196737 [1] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 01 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 02 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 03 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 04 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 05 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 06 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 07 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 08 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 09 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 10 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 11 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 12 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 13 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 14 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 15 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 16 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 17 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 18 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 19 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 20 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 21 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 22 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Ring 23 : 1 -> 0 -> 1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] -1/-1/-1->0->1 [9] -1/-1/-1->0->1 [10] -1/-1/-1->0->1 [11] -1/-1/-1->0->1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] -1/-1/-1->0->1 [19] -1/-1/-1->0->1 [20] -1/-1/-1->0->1 [21] -1/-1/-1->0->1 [22] -1/-1/-1->0->1 [23] -1/-1/-1->0->1 |
| n136-128-154:2196430:2196729 [0] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2196431:2196737 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2196431:2196745 [1] NCCL INFO [Proxy Service] Device 1 CPU core 48 |
| n136-128-154:2196431:2196746 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 47 |
| n136-128-154:2196430:2196729 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2196430:2196729 [0] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 0 |
| n136-128-154:2196430:2196747 [0] NCCL INFO [Proxy Service] Device 0 CPU core 98 |
| n136-128-154:2196430:2196748 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 37 |
| n136-128-154:2196431:2196737 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2196431:2196737 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2196430:2196729 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2196430:2196729 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2196430:2196729 [0] NCCL INFO CC Off, workFifoBytes 1048576 |
| n136-128-154:2196431:2196737 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2196431:2196737 [1] NCCL INFO ncclCommInitRankConfig comm 0x12d752a0 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0xf75dd042d9e31e3b - Init COMPLETE |
| n136-128-154:2196431:2196737 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 2 total 0.47 (kernels 0.12, alloc 0.16, bootstrap 0.00, allgathers 0.00, topo 0.06, graphs 0.00, connections 0.01, rest 0.12) |
| n136-128-154:2196430:2196729 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2196430:2196729 [0] NCCL INFO ncclCommInitRankConfig comm 0x12b70520 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0xf75dd042d9e31e3b - Init COMPLETE |
| n136-128-154:2196430:2196729 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 2 total 4.57 (kernels 0.13, alloc 0.16, bootstrap 4.09, allgathers 0.00, topo 0.06, graphs 0.00, connections 0.02, rest 0.11) |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 00/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 01/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 02/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 00/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 03/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 04/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 05/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 06/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 01/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 07/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 02/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 08/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 03/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 09/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 04/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 10/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 05/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 11/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 06/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 12/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 07/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 13/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 08/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 14/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 09/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 15/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 10/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 16/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 11/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 17/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 12/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 18/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 13/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 19/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 14/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 20/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 15/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 21/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 16/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 22/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 17/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196431:2196749 [1] NCCL INFO Channel 23/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 18/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 19/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 20/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 21/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 22/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Channel 23/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2196430:2196750 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| n136-128-154:2196431:2196749 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| 2025-12-09:14:58:33 INFO [evaluator:559] Running loglikelihood requests |
| 2025-12-09:14:58:33 INFO [evaluator:559] Running loglikelihood requests |
| Passed argument batch_size = auto:1. Detecting largest batch size |
|
Running loglikelihood requests: 0%| | 0/2344 [00:00<?, ?it/s]Passed argument batch_size = auto:1. Detecting largest batch size |
| Determined largest batch size: 45 |
| Determined largest batch size: 45 |
|
Running loglikelihood requests: 0%| | 1/2344 [00:56<36:53:22, 56.68s/it]
Running loglikelihood requests: 2%|โ | 46/2344 [01:00<36:31, 1.05it/s]
Running loglikelihood requests: 4%|โ | 94/2344 [01:04<16:04, 2.33it/s]
Running loglikelihood requests: 6%|โ | 142/2344 [01:08<09:46, 3.75it/s]
Running loglikelihood requests: 8%|โ | 189/2344 [01:11<06:54, 5.20it/s]
Running loglikelihood requests: 10%|โ | 242/2344 [01:15<05:02, 6.94it/s]
Running loglikelihood requests: 12%|โโ | 290/2344 [01:18<04:07, 8.28it/s]
Running loglikelihood requests: 14%|โโ | 336/2344 [01:22<03:34, 9.34it/s]
Running loglikelihood requests: 16%|โโ | 381/2344 [01:25<03:12, 10.20it/s]
Running loglikelihood requests: 18%|โโ | 429/2344 [01:29<02:52, 11.13it/s]
Running loglikelihood requests: 20%|โโ | 480/2344 [01:32<02:34, 12.09it/s]
Running loglikelihood requests: 22%|โโโ | 526/2344 [01:36<02:25, 12.48it/s]
Running loglikelihood requests: 25%|โโโ | 580/2344 [01:39<02:11, 13.44it/s]
Running loglikelihood requests: 27%|โโโ | 626/2344 [01:43<02:07, 13.47it/s]
Running loglikelihood requests: 29%|โโโ | 674/2344 [01:46<02:02, 13.68it/s]
Running loglikelihood requests: 31%|โโโ | 719/2344 [01:49<01:59, 13.65it/s]
Running loglikelihood requests: 33%|โโโโ | 765/2344 [01:53<01:55, 13.71it/s]
Running loglikelihood requests: 35%|โโโโ | 813/2344 [01:56<01:50, 13.92it/s]
Running loglikelihood requests: 37%|โโโโ | 863/2344 [01:59<01:43, 14.29it/s]
Running loglikelihood requests: 39%|โโโโ | 914/2344 [02:02<01:37, 14.70it/s]
Running loglikelihood requests: 41%|โโโโ | 959/2344 [02:06<01:35, 14.45it/s]
Running loglikelihood requests: 43%|โโโโโ | 1005/2344 [02:09<01:33, 14.36it/s]
Running loglikelihood requests: 45%|โโโโโ | 1052/2344 [02:12<01:29, 14.40it/s]
Running loglikelihood requests: 47%|โโโโโ | 1110/2344 [02:15<01:19, 15.51it/s]
Running loglikelihood requests: 49%|โโโโโ | 1159/2344 [02:19<01:16, 15.45it/s]
Running loglikelihood requests: 52%|โโโโโโ | 1208/2344 [02:22<01:13, 15.46it/s]
Running loglikelihood requests: 54%|โโโโโโ | 1255/2344 [02:25<01:11, 15.27it/s]
Running loglikelihood requests: 56%|โโโโโโ | 1303/2344 [02:28<01:08, 15.24it/s]
Running loglikelihood requests: 58%|โโโโโโ | 1356/2344 [02:31<01:02, 15.73it/s]
Running loglikelihood requests: 60%|โโโโโโ | 1401/2344 [02:34<01:01, 15.34it/s]
Running loglikelihood requests: 62%|โโโโโโโ | 1454/2344 [02:37<00:56, 15.87it/s]
Running loglikelihood requests: 64%|โโโโโโโ | 1504/2344 [02:40<00:52, 15.97it/s]
Running loglikelihood requests: 66%|โโโโโโโ | 1553/2344 [02:44<00:49, 16.02it/s]
Running loglikelihood requests: 68%|โโโโโโโ | 1601/2344 [02:47<00:46, 15.95it/s]
Running loglikelihood requests: 71%|โโโโโโโ | 1653/2344 [02:50<00:42, 16.33it/s]
Running loglikelihood requests: 72%|โโโโโโโโ | 1698/2344 [02:53<00:40, 15.94it/s]
Running loglikelihood requests: 74%|โโโโโโโโ | 1745/2344 [02:56<00:37, 15.88it/s]
Running loglikelihood requests: 76%|โโโโโโโโ | 1793/2344 [02:59<00:34, 15.97it/s]
Running loglikelihood requests: 78%|โโโโโโโโ | 1840/2344 [03:01<00:31, 15.94it/s]
Running loglikelihood requests: 81%|โโโโโโโโ | 1891/2344 [03:04<00:27, 16.34it/s]
Running loglikelihood requests: 83%|โโโโโโโโโ | 1938/2344 [03:07<00:24, 16.29it/s]
Running loglikelihood requests: 85%|โโโโโโโโโ | 1983/2344 [03:10<00:22, 16.08it/s]
Running loglikelihood requests: 87%|โโโโโโโโโ | 2032/2344 [03:13<00:19, 16.32it/s]
Running loglikelihood requests: 89%|โโโโโโโโโ | 2089/2344 [03:16<00:14, 17.36it/s]
Running loglikelihood requests: 91%|โโโโโโโโโ | 2137/2344 [03:19<00:12, 17.21it/s]
Running loglikelihood requests: 93%|โโโโโโโโโโ| 2185/2344 [03:22<00:09, 17.15it/s]
Running loglikelihood requests: 96%|โโโโโโโโโโ| 2241/2344 [03:24<00:05, 18.01it/s]
Running loglikelihood requests: 98%|โโโโโโโโโโ| 2295/2344 [03:27<00:02, 19.24it/s]
Running loglikelihood requests: 100%|โโโโโโโโโโ| 2344/2344 [03:27<00:00, 11.30it/s] |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 00/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 01/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 02/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 03/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 04/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 05/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 06/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 07/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 08/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 09/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 10/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 11/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 12/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 13/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 14/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 15/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 16/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 17/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 18/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 19/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 20/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 21/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 22/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 23/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 24/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 25/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 26/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 27/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 28/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 29/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 30/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2196431:2197004 [1] NCCL INFO Channel 31/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| fatal: detected dubious ownership in repository at '/mnt/bn/life-mllm/users/cxr/quantization' |
| To add an exception for this directory, call: |
|
|
| git config |
| n136-128-154:2196431:2197009 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2196431:2197009 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2196431:2197009 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2196431:2197009 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2196431:2197009 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2196431:2197009 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2196431:2196745 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| 2025-12-09:15:02:23 INFO [loggers.evaluation_tracker:209] Saving results aggregated |
| hf (pretrained=/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg), gen_kwargs: (None), limit: None, num_fewshot: 25, batch_size: auto (45) |
| | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |
| | |
| |arc_challenge| 1|none | 25|acc |โ |0.3677|ยฑ |0.0141| |
| | | |none | 25|acc_norm|โ |0.4155|ยฑ |0.0144| |
|
|
| [rank0]:[W1209 15:02:23.092941084 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) |
| n136-128-154:2196430:2197059 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2196430:2197059 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2196430:2197059 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2196430:2196747 [0] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2196431:2196745 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2196430:2197059 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2196430:2197059 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2196430:2197059 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2196431:2197009 [1] NCCL INFO comm 0x12d752a0 rank 1 nranks 2 cudaDev 1 busId c9000 - Abort COMPLETE |
| n136-128-154:2196430:2197059 [0] NCCL INFO comm 0x12b70520 rank 0 nranks 2 cudaDev 0 busId c5000 - Abort COMPLETE |
| ไปปๅก arc_challenge ่ฏไผฐๅฎๆ๏ผ |
|
|
| ================================================== |
| ๅผๅง่ฏไผฐ๏ผไปปๅก=truthfulqa_mc1 | ๅฐๆ ทๆฌๆฐ=0 | ๆจกๅ=Qwen2.5-7B-quantization-fg |
| ่พๅบ่ทฏๅพ๏ผresults2/Qwen2.5-7B-quantization-fg/fg5/truthfulqa_mc1.json |
| ================================================== |
| The following values were not passed to `accelerate launch` and had defaults used instead: |
| More than one GPU was found, enabling multi-GPU training. |
| If this was unintended please pass in ` |
| ` |
| ` |
| ` |
| To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. |
| 2025-12-09:15:03:52 INFO [__main__:440] Selected Tasks: ['truthfulqa_mc1'] |
| 2025-12-09:15:03:52 INFO [__main__:440] Selected Tasks: ['truthfulqa_mc1'] |
| 2025-12-09:15:03:52 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:15:03:52 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:15:03:52 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:15:03:52 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:15:03:53 WARNING [accelerate.utils.other:513] Detected kernel version 5.4.143, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. |
| The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:15:03:53 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:1'} |
| The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:15:03:53 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'} |
| `torch_dtype` is deprecated! Use `dtype` instead! |
| `torch_dtype` is deprecated! Use `dtype` instead! |
|
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:01<00:03, 1.06s/it]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:01<00:03, 1.06s/it]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:02<00:02, 1.04s/it]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:02<00:02, 1.05s/it]
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:03<00:00, 1.01it/s]
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:03<00:00, 1.01it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:03<00:00, 1.42it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:03<00:00, 1.22it/s] |
|
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:03<00:00, 1.41it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:03<00:00, 1.21it/s] |
| 2025-12-09:15:04:13 INFO [evaluator:305] num_fewshot has been set to 0 for truthfulqa_mc1 in its config. Manual configuration will be ignored. |
| 2025-12-09:15:04:13 INFO [api.task:434] Building contexts for truthfulqa_mc1 on rank 1... |
|
0%| | 0/408 [00:00<?, ?it/s]
25%|โโโ | 102/408 [00:00<00:00, 1008.23it/s]
50%|โโโโโ | 203/408 [00:00<00:00, 968.44it/s]
74%|โโโโโโโโ | 302/408 [00:00<00:00, 973.54it/s]
98%|โโโโโโโโโโ| 400/408 [00:00<00:00, 972.30it/s]
100%|โโโโโโโโโโ| 408/408 [00:00<00:00, 975.42it/s] |
| 2025-12-09:15:04:16 INFO [evaluator:305] num_fewshot has been set to 0 for truthfulqa_mc1 in its config. Manual configuration will be ignored. |
| 2025-12-09:15:04:16 INFO [api.task:434] Building contexts for truthfulqa_mc1 on rank 0... |
|
0%| | 0/409 [00:00<?, ?it/s]
25%|โโโ | 101/409 [00:00<00:00, 1007.02it/s]
50%|โโโโโ | 205/409 [00:00<00:00, 1025.05it/s]
76%|โโโโโโโโ | 309/409 [00:00<00:00, 1029.47it/s]
100%|โโโโโโโโโโ| 409/409 [00:00<00:00, 1019.63it/s] |
| n136-128-154:2197115:2197115 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2197115:2197115 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2197115:2197115 [0] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2197115:2197115 [0] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2197115:2197115 [0] NCCL INFO Comm config Blocking set to 1 |
| n136-128-154:2197116:2197116 [1] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2197116:2197116 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2197116:2197116 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2197116:2197116 [1] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2197116:2197116 [1] NCCL INFO Comm config Blocking set to 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2197115:2197344 [0] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2197116:2197345 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2197115:2197344 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2197115:2197344 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2197115:2197344 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2197116:2197345 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2197116:2197345 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2197116:2197345 [1] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2197115:2197344 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2197115:2197344 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2197115:2197344 [0] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2197115:2197344 [0] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2197115:2197344 [0] NCCL INFO Using network IB |
| n136-128-154:2197116:2197345 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2197116:2197345 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2197116:2197345 [1] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2197116:2197345 [1] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2197116:2197345 [1] NCCL INFO Using network IB |
| n136-128-154:2197115:2197344 [0] NCCL INFO ncclCommInitRankConfig comm 0x1029d480 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0xa98f936c31921541 - Init START |
| n136-128-154:2197116:2197345 [1] NCCL INFO ncclCommInitRankConfig comm 0xea51340 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0xa98f936c31921541 - Init START |
| n136-128-154:2197115:2197344 [0] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2197116:2197345 [1] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2197116:2197345 [1] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2197116:2197345 [1] NCCL INFO Retrieving state for IB |
| n136-128-154:2197116:2197345 [1] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2197116:2197345 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2197115:2197344 [0] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2197116:2197345 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2197116:2197345 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2197116:2197345 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2197115:2197344 [0] NCCL INFO Retrieving state for IB |
| n136-128-154:2197115:2197344 [0] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2197115:2197344 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2197115:2197344 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2197115:2197344 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2197115:2197344 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2197115:2197344 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2197116:2197345 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2197115:2197344 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2197116:2197345 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2197115:2197344 [0] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2197115:2197344 [0] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2197115:2197344 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2197115:2197344 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2197115:2197344 [0] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO ========================================== |
| n136-128-154:2197115:2197344 [0] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2197115:2197344 [0] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2197115:2197344 [0] NCCL INFO Setting affinity for GPU 6 to 33-62,97-126 |
| n136-128-154:2197116:2197345 [1] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2197116:2197345 [1] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2197116:2197345 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2197116:2197345 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2197116:2197345 [1] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2197116:2197345 [1] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO ========================================== |
| n136-128-154:2197116:2197345 [1] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2197116:2197345 [1] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2197116:2197345 [1] NCCL INFO Setting affinity for GPU 7 to 33-62,97-126 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197115:2197344 [0] NCCL INFO comm 0x1029d480 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0 |
| n136-128-154:2197116:2197345 [1] NCCL INFO comm 0xea51340 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 0 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 0 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 12 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 12 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 1 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 1 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 13 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 13 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 2 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 2 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 14 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 14 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 3 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 3 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 15 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 15 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 4 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 4 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 16 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 16 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 5 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 5 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 17 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 17 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 6 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 6 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 18 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 18 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 7 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 7 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 19 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 19 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 8 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 8 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 20 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 20 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 9 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 9 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 21 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 21 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 10 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 10 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 22 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 22 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 11 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 11 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Tree 23 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Tree 23 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 00/24 : 0 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 01/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 00 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 02/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 01 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 03/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 02 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 04/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 03 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 05/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 04 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 06/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 05 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 07/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 06 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 08/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 07 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 09/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 08 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 10/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 09 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 11/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 10 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 12/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 11 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 13/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 12 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 14/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 13 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 15/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 14 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 16/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 15 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 17/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 16 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 18/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 17 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 19/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 18 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 20/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 19 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 21/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 20 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 22/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 21 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Channel 23/24 : 0 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 22 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 00 : 1 -> 0 -> 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Ring 23 : 0 -> 1 -> 0 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 01 : 1 -> 0 -> 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] -1/-1/-1->1->0 [5] -1/-1/-1->1->0 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] 0/-1/-1->1->-1 [9] 0/-1/-1->1->-1 [10] 0/-1/-1->1->-1 [11] 0/-1/-1->1->-1 [12] -1/-1/-1->1->0 [13] -1/-1/-1->1->0 [14] -1/-1/-1->1->0 [15] -1/-1/-1->1->0 [16] -1/-1/-1->1->0 [17] -1/-1/-1->1->0 [18] 0/-1/-1->1->-1 [19] 0/-1/-1->1->-1 [20] 0/-1/-1->1->-1 [21] 0/-1/-1->1->-1 [22] 0/-1/-1->1->-1 [23] 0/-1/-1->1->-1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 02 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 03 : 1 -> 0 -> 1 |
| n136-128-154:2197116:2197345 [1] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 04 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 05 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 06 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 07 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 08 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 09 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 10 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 11 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 12 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 13 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 14 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 15 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 16 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 17 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 18 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 19 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 20 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 21 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 22 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Ring 23 : 1 -> 0 -> 1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] -1/-1/-1->0->1 [9] -1/-1/-1->0->1 [10] -1/-1/-1->0->1 [11] -1/-1/-1->0->1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] -1/-1/-1->0->1 [19] -1/-1/-1->0->1 [20] -1/-1/-1->0->1 [21] -1/-1/-1->0->1 [22] -1/-1/-1->0->1 [23] -1/-1/-1->0->1 |
| n136-128-154:2197115:2197344 [0] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2197115:2197344 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2197115:2197344 [0] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 0 |
| n136-128-154:2197115:2197358 [0] NCCL INFO [Proxy Service] Device 0 CPU core 98 |
| n136-128-154:2197115:2197359 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 35 |
| n136-128-154:2197116:2197345 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2197116:2197360 [1] NCCL INFO [Proxy Service] Device 1 CPU core 115 |
| n136-128-154:2197116:2197361 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 53 |
| n136-128-154:2197115:2197344 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2197115:2197344 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2197115:2197344 [0] NCCL INFO CC Off, workFifoBytes 1048576 |
| n136-128-154:2197116:2197345 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2197116:2197345 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2197115:2197344 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2197115:2197344 [0] NCCL INFO ncclCommInitRankConfig comm 0x1029d480 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0xa98f936c31921541 - Init COMPLETE |
| n136-128-154:2197115:2197344 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 2 total 0.86 (kernels 0.22, alloc 0.44, bootstrap 0.00, allgathers 0.01, topo 0.05, graphs 0.00, connections 0.01, rest 0.12) |
| n136-128-154:2197116:2197345 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2197116:2197345 [1] NCCL INFO ncclCommInitRankConfig comm 0xea51340 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0xa98f936c31921541 - Init COMPLETE |
| n136-128-154:2197116:2197345 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 2 total 0.85 (kernels 0.22, alloc 0.44, bootstrap 0.00, allgathers 0.00, topo 0.05, graphs 0.00, connections 0.01, rest 0.12) |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 00/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 01/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 02/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 03/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 04/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 00/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 05/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 01/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 06/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 02/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 07/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 03/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 08/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 04/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 09/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 05/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 10/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 06/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 11/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 07/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 12/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 08/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 13/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 09/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 10/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 14/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 11/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 15/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 12/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 16/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 13/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 17/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 14/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 18/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 15/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 19/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 16/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 20/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 17/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 21/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 18/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 22/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 19/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Channel 23/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 20/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 21/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 22/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197363 [1] NCCL INFO Channel 23/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197115:2197362 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| n136-128-154:2197116:2197363 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| 2025-12-09:15:04:17 INFO [evaluator:559] Running loglikelihood requests |
| 2025-12-09:15:04:17 INFO [evaluator:559] Running loglikelihood requests |
| Passed argument batch_size = auto:1. Detecting largest batch size |
|
Running loglikelihood requests: 0%| | 0/2066 [00:00<?, ?it/s]Passed argument batch_size = auto:1. Detecting largest batch size |
| Determined largest batch size: 64 |
| Determined largest batch size: 64 |
|
Running loglikelihood requests: 0%| | 1/2066 [00:07<4:02:12, 7.04s/it]
Running loglikelihood requests: 3%|โ | 65/2066 [00:08<03:08, 10.61it/s]
Running loglikelihood requests: 6%|โ | 129/2066 [00:09<01:35, 20.26it/s]
Running loglikelihood requests: 9%|โ | 193/2066 [00:10<01:05, 28.76it/s]
Running loglikelihood requests: 12%|โโ | 257/2066 [00:11<00:50, 35.93it/s]
Running loglikelihood requests: 16%|โโ | 321/2066 [00:12<00:41, 41.77it/s]
Running loglikelihood requests: 19%|โโ | 385/2066 [00:13<00:36, 46.21it/s]
Running loglikelihood requests: 22%|โโโ | 449/2066 [00:14<00:32, 49.87it/s]
Running loglikelihood requests: 25%|โโโ | 513/2066 [00:15<00:29, 52.63it/s]
Running loglikelihood requests: 28%|โโโ | 577/2066 [00:16<00:27, 54.74it/s]
Running loglikelihood requests: 31%|โโโ | 641/2066 [00:18<00:25, 56.27it/s]
Running loglikelihood requests: 34%|โโโโ | 705/2066 [00:19<00:23, 57.46it/s]
Running loglikelihood requests: 37%|โโโโ | 769/2066 [00:20<00:22, 58.29it/s]
Running loglikelihood requests: 40%|โโโโ | 833/2066 [00:21<00:20, 58.89it/s]
Running loglikelihood requests: 43%|โโโโโ | 897/2066 [00:22<00:19, 59.47it/s]
Running loglikelihood requests: 47%|โโโโโ | 961/2066 [00:23<00:18, 59.83it/s]
Running loglikelihood requests: 50%|โโโโโ | 1025/2066 [00:24<00:17, 60.16it/s]
Running loglikelihood requests: 53%|โโโโโโ | 1089/2066 [00:25<00:16, 60.32it/s]
Running loglikelihood requests: 56%|โโโโโโ | 1153/2066 [00:26<00:15, 60.62it/s]
Running loglikelihood requests: 59%|โโโโโโ | 1217/2066 [00:27<00:13, 60.84it/s]
Running loglikelihood requests: 62%|โโโโโโโ | 1281/2066 [00:28<00:12, 61.03it/s]
Running loglikelihood requests: 65%|โโโโโโโ | 1345/2066 [00:29<00:11, 61.10it/s]
Running loglikelihood requests: 68%|โโโโโโโ | 1409/2066 [00:30<00:10, 61.27it/s]
Running loglikelihood requests: 71%|โโโโโโโโ | 1473/2066 [00:31<00:09, 61.33it/s]
Running loglikelihood requests: 74%|โโโโโโโโ | 1537/2066 [00:32<00:08, 61.48it/s]
Running loglikelihood requests: 77%|โโโโโโโโ | 1601/2066 [00:33<00:07, 61.49it/s]
Running loglikelihood requests: 81%|โโโโโโโโ | 1665/2066 [00:34<00:06, 61.56it/s]
Running loglikelihood requests: 84%|โโโโโโโโโ | 1729/2066 [00:35<00:05, 61.70it/s]
Running loglikelihood requests: 87%|โโโโโโโโโ | 1793/2066 [00:36<00:04, 61.80it/s]
Running loglikelihood requests: 90%|โโโโโโโโโ | 1857/2066 [00:37<00:03, 61.95it/s]
Running loglikelihood requests: 93%|โโโโโโโโโโ| 1921/2066 [00:38<00:02, 62.12it/s]
Running loglikelihood requests: 96%|โโโโโโโโโโ| 1985/2066 [00:39<00:01, 62.59it/s]
Running loglikelihood requests: 99%|โโโโโโโโโโ| 2049/2066 [00:40<00:00, 79.42it/s]
Running loglikelihood requests: 100%|โโโโโโโโโโ| 2066/2066 [00:40<00:00, 51.41it/s] |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 00/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 01/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 02/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 03/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 04/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 05/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 06/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 07/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 08/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 09/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 10/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 11/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 12/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 13/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 14/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 15/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 16/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 17/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 18/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 19/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 20/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 21/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 22/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 23/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 24/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 25/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 26/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 27/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 28/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 29/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 30/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197116:2197410 [1] NCCL INFO Channel 31/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| fatal: detected dubious ownership in repository at '/mnt/bn/life-mllm/users/cxr/quantization' |
| To add an exception for this directory, call: |
|
|
| git config |
| n136-128-154:2197116:2197415 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2197116:2197415 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2197116:2197415 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2197116:2197415 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2197116:2197415 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2197116:2197415 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2197116:2197360 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| 2025-12-09:15:05:10 INFO [loggers.evaluation_tracker:209] Saving results aggregated |
| hf (pretrained=/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto (64) |
| | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |
| | |
| |truthfulqa_mc1| 2|none | 0|acc |โ |0.2827|ยฑ |0.0158| |
|
|
| [rank0]:[W1209 15:05:11.708249790 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) |
| n136-128-154:2197115:2197467 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2197115:2197467 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2197115:2197467 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2197115:2197358 [0] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2197115:2197467 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2197116:2197360 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2197115:2197467 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2197115:2197467 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2197116:2197415 [1] NCCL INFO comm 0xea51340 rank 1 nranks 2 cudaDev 1 busId c9000 - Abort COMPLETE |
| n136-128-154:2197115:2197467 [0] NCCL INFO comm 0x1029d480 rank 0 nranks 2 cudaDev 0 busId c5000 - Abort COMPLETE |
| ไปปๅก truthfulqa_mc1 ่ฏไผฐๅฎๆ๏ผ |
|
|
| ================================================== |
| ๅผๅง่ฏไผฐ๏ผไปปๅก=piqa | ๅฐๆ ทๆฌๆฐ=0 | ๆจกๅ=Qwen2.5-7B-quantization-fg |
| ่พๅบ่ทฏๅพ๏ผresults2/Qwen2.5-7B-quantization-fg/fg5/piqa.json |
| ================================================== |
| The following values were not passed to `accelerate launch` and had defaults used instead: |
| More than one GPU was found, enabling multi-GPU training. |
| If this was unintended please pass in ` |
| ` |
| ` |
| ` |
| To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. |
| 2025-12-09:15:06:33 INFO [__main__:440] Selected Tasks: ['piqa'] |
| 2025-12-09:15:06:33 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:15:06:33 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:15:06:33 INFO [__main__:440] Selected Tasks: ['piqa'] |
| 2025-12-09:15:06:33 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:15:06:33 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:15:06:34 WARNING [accelerate.utils.other:513] Detected kernel version 5.4.143, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. |
| The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:15:06:34 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:1'} |
| `torch_dtype` is deprecated! Use `dtype` instead! |
| The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:15:06:34 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'} |
| `torch_dtype` is deprecated! Use `dtype` instead! |
|
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:00<00:02, 1.18it/s]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:00<00:02, 1.17it/s]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:01<00:01, 1.12it/s]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:01<00:01, 1.11it/s]
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:02<00:00, 1.20it/s]
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:02<00:00, 1.16it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.69it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.45it/s] |
|
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.62it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.41it/s] |
| 2025-12-09:15:06:54 WARNING [evaluator:309] Overwriting default num_fewshot of piqa from None to 0 |
| 2025-12-09:15:06:54 INFO [api.task:434] Building contexts for piqa on rank 0... |
|
0%| | 0/919 [00:00<?, ?it/s]
16%|โโ | 147/919 [00:00<00:00, 1465.95it/s]
32%|โโโโ | 297/919 [00:00<00:00, 1480.36it/s]
49%|โโโโโ | 446/919 [00:00<00:00, 1482.11it/s]
65%|โโโโโโโ | 595/919 [00:00<00:00, 1480.07it/s]
81%|โโโโโโโโโ | 747/919 [00:00<00:00, 1491.04it/s]
98%|โโโโโโโโโโ| 897/919 [00:00<00:00, 1491.66it/s]
100%|โโโโโโโโโโ| 919/919 [00:00<00:00, 1486.63it/s] |
| n136-128-154:2197525:2197525 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2197525:2197525 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2197525:2197525 [0] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2197525:2197525 [0] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2197525:2197525 [0] NCCL INFO Comm config Blocking set to 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2197525:2197936 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2197525:2197936 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2197525:2197936 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2197525:2197936 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2197525:2197936 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2197525:2197936 [0] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2197525:2197936 [0] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2197525:2197936 [0] NCCL INFO Using network IB |
| n136-128-154:2197525:2197936 [0] NCCL INFO ncclCommInitRankConfig comm 0x10631150 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0x2ac4a3924f989c1 - Init START |
| 2025-12-09:15:06:59 WARNING [evaluator:309] Overwriting default num_fewshot of piqa from None to 0 |
| 2025-12-09:15:06:59 INFO [api.task:434] Building contexts for piqa on rank 1... |
|
0%| | 0/919 [00:00<?, ?it/s]
15%|โโ | 142/919 [00:00<00:00, 1411.42it/s]
32%|โโโโ | 290/919 [00:00<00:00, 1449.86it/s]
48%|โโโโโ | 440/919 [00:00<00:00, 1468.84it/s]
64%|โโโโโโโ | 587/919 [00:00<00:00, 1468.45it/s]
80%|โโโโโโโโ | 734/919 [00:00<00:00, 1465.75it/s]
96%|โโโโโโโโโโ| 881/919 [00:00<00:00, 1463.75it/s]
100%|โโโโโโโโโโ| 919/919 [00:00<00:00, 1461.79it/s] |
| n136-128-154:2197526:2197526 [1] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2197526:2197526 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2197526:2197526 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2197526:2197526 [1] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2197526:2197526 [1] NCCL INFO Comm config Blocking set to 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2197526:2197979 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2197526:2197979 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2197526:2197979 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2197526:2197979 [1] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2197526:2197979 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2197526:2197979 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2197526:2197979 [1] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2197526:2197979 [1] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2197526:2197979 [1] NCCL INFO Using network IB |
| n136-128-154:2197526:2197979 [1] NCCL INFO ncclCommInitRankConfig comm 0xffc5770 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0x2ac4a3924f989c1 - Init START |
| n136-128-154:2197526:2197979 [1] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2197525:2197936 [0] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2197526:2197979 [1] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2197526:2197979 [1] NCCL INFO Retrieving state for IB |
| n136-128-154:2197526:2197979 [1] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2197526:2197979 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2197526:2197979 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2197526:2197979 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2197526:2197979 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2197525:2197936 [0] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2197525:2197936 [0] NCCL INFO Retrieving state for IB |
| n136-128-154:2197525:2197936 [0] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2197525:2197936 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2197525:2197936 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2197525:2197936 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2197525:2197936 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2197525:2197936 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2197525:2197936 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2197526:2197979 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2197526:2197979 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2197525:2197936 [0] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2197525:2197936 [0] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2197526:2197979 [1] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2197525:2197936 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2197526:2197979 [1] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2197525:2197936 [0] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2197525:2197936 [0] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2197526:2197979 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2197526:2197979 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO ========================================== |
| n136-128-154:2197526:2197979 [1] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2197526:2197979 [1] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2197525:2197936 [0] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2197525:2197936 [0] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2197526:2197979 [1] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO ========================================== |
| n136-128-154:2197525:2197936 [0] NCCL INFO Setting affinity for GPU 6 to 33-62,97-126 |
| n136-128-154:2197526:2197979 [1] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2197526:2197979 [1] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2197526:2197979 [1] NCCL INFO Setting affinity for GPU 7 to 33-62,97-126 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2197526:2197979 [1] NCCL INFO comm 0xffc5770 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO comm 0x10631150 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 0 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 12 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 0 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 1 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 12 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 13 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 1 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 2 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 13 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 14 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 2 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 3 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 14 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 15 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 3 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 4 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 15 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 16 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 4 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 5 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 16 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 17 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 5 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 6 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 17 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 18 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 6 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 7 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 18 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 19 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 7 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 8 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 19 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 20 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 8 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 9 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 20 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 21 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 9 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 10 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 21 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 22 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 10 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 11 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 22 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Tree 23 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 11 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Tree 23 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 00/24 : 0 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 01/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 00 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 02/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 01 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 03/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 02 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 04/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 03 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 05/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 04 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 06/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 05 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 07/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 06 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 08/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 07 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 09/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 08 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 10/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 09 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 11/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 10 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 12/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 11 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 13/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 12 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 14/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 13 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 15/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 14 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 16/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 15 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 17/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 16 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 18/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 17 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 19/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 18 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 20/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 19 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 21/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 20 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 22/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 21 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Channel 23/24 : 0 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 22 : 0 -> 1 -> 0 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Ring 23 : 0 -> 1 -> 0 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 00 : 1 -> 0 -> 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] -1/-1/-1->1->0 [5] -1/-1/-1->1->0 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] 0/-1/-1->1->-1 [9] 0/-1/-1->1->-1 [10] 0/-1/-1->1->-1 [11] 0/-1/-1->1->-1 [12] -1/-1/-1->1->0 [13] -1/-1/-1->1->0 [14] -1/-1/-1->1->0 [15] -1/-1/-1->1->0 [16] -1/-1/-1->1->0 [17] -1/-1/-1->1->0 [18] 0/-1/-1->1->-1 [19] 0/-1/-1->1->-1 [20] 0/-1/-1->1->-1 [21] 0/-1/-1->1->-1 [22] 0/-1/-1->1->-1 [23] 0/-1/-1->1->-1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 01 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 02 : 1 -> 0 -> 1 |
| n136-128-154:2197526:2197979 [1] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 03 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 04 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 05 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 06 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 07 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 08 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 09 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 10 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 11 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 12 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 13 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 14 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 15 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 16 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 17 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 18 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 19 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 20 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 21 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 22 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Ring 23 : 1 -> 0 -> 1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] -1/-1/-1->0->1 [9] -1/-1/-1->0->1 [10] -1/-1/-1->0->1 [11] -1/-1/-1->0->1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] -1/-1/-1->0->1 [19] -1/-1/-1->0->1 [20] -1/-1/-1->0->1 [21] -1/-1/-1->0->1 [22] -1/-1/-1->0->1 [23] -1/-1/-1->0->1 |
| n136-128-154:2197525:2197936 [0] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2197526:2197979 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2197526:2197986 [1] NCCL INFO [Proxy Service] Device 1 CPU core 38 |
| n136-128-154:2197526:2197987 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 40 |
| n136-128-154:2197525:2197936 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2197525:2197936 [0] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 0 |
| n136-128-154:2197525:2197988 [0] NCCL INFO [Proxy Service] Device 0 CPU core 100 |
| n136-128-154:2197525:2197989 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 105 |
| n136-128-154:2197525:2197936 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2197525:2197936 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2197525:2197936 [0] NCCL INFO CC Off, workFifoBytes 1048576 |
| n136-128-154:2197526:2197979 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2197526:2197979 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2197526:2197979 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2197526:2197979 [1] NCCL INFO ncclCommInitRankConfig comm 0xffc5770 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0x2ac4a3924f989c1 - Init COMPLETE |
| n136-128-154:2197526:2197979 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 2 total 0.37 (kernels 0.14, alloc 0.14, bootstrap 0.00, allgathers 0.00, topo 0.05, graphs 0.00, connections 0.01, rest 0.03) |
| n136-128-154:2197525:2197936 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2197525:2197936 [0] NCCL INFO ncclCommInitRankConfig comm 0x10631150 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0x2ac4a3924f989c1 - Init COMPLETE |
| n136-128-154:2197525:2197936 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 2 total 4.96 (kernels 0.17, alloc 0.11, bootstrap 4.58, allgathers 0.00, topo 0.05, graphs 0.00, connections 0.01, rest 0.03) |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 00/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 01/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 02/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 03/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 04/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 05/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 06/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 00/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 07/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 01/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 08/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 09/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 02/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 10/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 03/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 11/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 04/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 12/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 05/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 13/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 06/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 14/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 07/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 15/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 08/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 16/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 17/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 09/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 18/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 10/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 19/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 11/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 20/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 12/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 21/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 13/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 22/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 14/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197526:2197990 [1] NCCL INFO Channel 23/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 15/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 16/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 17/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 18/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 19/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 20/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 21/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 22/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Channel 23/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2197525:2197991 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| n136-128-154:2197526:2197990 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| 2025-12-09:15:07:00 INFO [evaluator:559] Running loglikelihood requests |
| 2025-12-09:15:07:00 INFO [evaluator:559] Running loglikelihood requests |
| Passed argument batch_size = auto:1. Detecting largest batch size |
|
Running loglikelihood requests: 0%| | 0/1838 [00:00<?, ?it/s]Passed argument batch_size = auto:1. Detecting largest batch size |
| Determined largest batch size: 64 |
| Determined largest batch size: 64 |
|
Running loglikelihood requests: 0%| | 1/1838 [00:07<3:55:56, 7.71s/it]
Running loglikelihood requests: 4%|โ | 65/1838 [00:08<02:46, 10.68it/s]
Running loglikelihood requests: 7%|โ | 129/1838 [00:08<01:15, 22.62it/s]
Running loglikelihood requests: 11%|โ | 194/1838 [00:09<00:45, 35.81it/s]
Running loglikelihood requests: 14%|โโ | 258/1838 [00:10<00:31, 49.85it/s]
Running loglikelihood requests: 18%|โโ | 322/1838 [00:10<00:23, 64.38it/s]
Running loglikelihood requests: 21%|โโ | 386/1838 [00:10<00:18, 78.65it/s]
Running loglikelihood requests: 25%|โโโ | 451/1838 [00:11<00:14, 92.82it/s]
Running loglikelihood requests: 28%|โโโ | 515/1838 [00:11<00:12, 105.53it/s]
Running loglikelihood requests: 32%|โโโโ | 579/1838 [00:12<00:10, 116.98it/s]
Running loglikelihood requests: 35%|โโโโ | 643/1838 [00:12<00:09, 126.53it/s]
Running loglikelihood requests: 38%|โโโโ | 707/1838 [00:13<00:08, 131.62it/s]
Running loglikelihood requests: 42%|โโโโโ | 771/1838 [00:13<00:07, 140.00it/s]
Running loglikelihood requests: 45%|โโโโโ | 835/1838 [00:13<00:06, 146.61it/s]
Running loglikelihood requests: 49%|โโโโโ | 899/1838 [00:14<00:06, 152.15it/s]
Running loglikelihood requests: 52%|โโโโโโ | 964/1838 [00:14<00:05, 156.95it/s]
Running loglikelihood requests: 56%|โโโโโโ | 1028/1838 [00:15<00:05, 161.75it/s]
Running loglikelihood requests: 59%|โโโโโโ | 1092/1838 [00:15<00:04, 165.85it/s]
Running loglikelihood requests: 63%|โโโโโโโ | 1157/1838 [00:15<00:04, 169.49it/s]
Running loglikelihood requests: 66%|โโโโโโโ | 1221/1838 [00:16<00:03, 173.47it/s]
Running loglikelihood requests: 70%|โโโโโโโ | 1286/1838 [00:16<00:03, 177.51it/s]
Running loglikelihood requests: 73%|โโโโโโโโ | 1350/1838 [00:16<00:02, 180.57it/s]
Running loglikelihood requests: 77%|โโโโโโโโ | 1414/1838 [00:17<00:02, 182.71it/s]
Running loglikelihood requests: 80%|โโโโโโโโ | 1478/1838 [00:17<00:01, 185.10it/s]
Running loglikelihood requests: 84%|โโโโโโโโโ | 1542/1838 [00:17<00:01, 186.93it/s]
Running loglikelihood requests: 87%|โโโโโโโโโ | 1606/1838 [00:18<00:01, 188.93it/s]
Running loglikelihood requests: 91%|โโโโโโโโโ | 1670/1838 [00:18<00:00, 192.83it/s]
Running loglikelihood requests: 94%|โโโโโโโโโโ| 1734/1838 [00:18<00:00, 196.16it/s]
Running loglikelihood requests: 98%|โโโโโโโโโโ| 1799/1838 [00:18<00:00, 222.00it/s]
Running loglikelihood requests: 100%|โโโโโโโโโโ| 1838/1838 [00:18<00:00, 96.87it/s] |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 00/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 01/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 02/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 03/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 04/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 05/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 06/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 07/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 08/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 09/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 10/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 11/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 12/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 13/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 14/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 15/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 16/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 17/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 18/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 19/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 20/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 21/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 22/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 23/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 24/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 25/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 26/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 27/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 28/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 29/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 30/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2197526:2198033 [1] NCCL INFO Channel 31/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| fatal: detected dubious ownership in repository at '/mnt/bn/life-mllm/users/cxr/quantization' |
| To add an exception for this directory, call: |
|
|
| git config |
| n136-128-154:2197526:2198037 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2197526:2198037 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2197526:2198037 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2197526:2198037 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2197526:2198037 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2197526:2198037 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2197526:2197986 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| 2025-12-09:15:07:30 INFO [loggers.evaluation_tracker:209] Saving results aggregated |
| hf (pretrained=/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto (64) |
| |Tasks|Version|Filter|n-shot| Metric | |Value | |Stderr| |
| | |
| |piqa | 1|none | 0|acc |โ |0.6752|ยฑ |0.0109| |
| | | |none | 0|acc_norm|โ |0.6567|ยฑ |0.0111| |
|
|
| [rank0]:[W1209 15:07:30.153193927 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) |
| n136-128-154:2197525:2198091 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2197525:2198091 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2197525:2198091 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2197525:2197988 [0] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2197526:2197986 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2197525:2198091 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2197525:2198091 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2197525:2198091 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2197526:2198037 [1] NCCL INFO comm 0xffc5770 rank 1 nranks 2 cudaDev 1 busId c9000 - Abort COMPLETE |
| n136-128-154:2197525:2198091 [0] NCCL INFO comm 0x10631150 rank 0 nranks 2 cudaDev 0 busId c5000 - Abort COMPLETE |
| ไปปๅก piqa ่ฏไผฐๅฎๆ๏ผ |
|
|
| ================================================== |
| ๅผๅง่ฏไผฐ๏ผไปปๅก=hellaswag | ๅฐๆ ทๆฌๆฐ=10 | ๆจกๅ=Qwen2.5-7B-quantization-fg |
| ่พๅบ่ทฏๅพ๏ผresults2/Qwen2.5-7B-quantization-fg/fg5/hellaswag.json |
| ================================================== |
| The following values were not passed to `accelerate launch` and had defaults used instead: |
| More than one GPU was found, enabling multi-GPU training. |
| If this was unintended please pass in ` |
| ` |
| ` |
| ` |
| To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. |
| 2025-12-09:15:08:56 INFO [__main__:440] Selected Tasks: ['hellaswag'] |
| 2025-12-09:15:08:56 INFO [__main__:440] Selected Tasks: ['hellaswag'] |
| 2025-12-09:15:08:56 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:15:08:56 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:15:08:56 INFO [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 |
| 2025-12-09:15:08:56 INFO [evaluator:227] Initializing hf model, with arguments: {'pretrained': '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg'} |
| 2025-12-09:15:08:57 WARNING [accelerate.utils.other:513] Detected kernel version 5.4.143, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. |
| The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:15:08:58 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:1'} |
| The tokenizer you are loading from '/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. |
| 2025-12-09:15:08:58 INFO [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'} |
| `torch_dtype` is deprecated! Use `dtype` instead! |
| `torch_dtype` is deprecated! Use `dtype` instead! |
|
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:00<00:02, 1.38it/s]
Loading checkpoint shards: 25%|โโโ | 1/4 [00:00<00:02, 1.36it/s]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:01<00:01, 1.33it/s]
Loading checkpoint shards: 50%|โโโโโ | 2/4 [00:01<00:01, 1.21it/s]
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:02<00:00, 1.31it/s]
Loading checkpoint shards: 75%|โโโโโโโโ | 3/4 [00:02<00:00, 1.22it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.81it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.60it/s] |
|
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.68it/s]
Loading checkpoint shards: 100%|โโโโโโโโโโ| 4/4 [00:02<00:00, 1.49it/s] |
| 2025-12-09:15:09:17 WARNING [evaluator:309] Overwriting default num_fewshot of hellaswag from None to 10 |
| 2025-12-09:15:09:17 INFO [api.task:434] Building contexts for hellaswag on rank 0... |
|
0%| | 0/5021 [00:00<?, ?it/s]
0%| | 19/5021 [00:00<00:26, 189.23it/s]
1%| | 39/5021 [00:00<00:25, 193.03it/s]
1%| | 59/5021 [00:00<00:25, 194.23it/s]
2%|โ | 79/5021 [00:00<00:25, 191.82it/s]
2%|โ | 99/5021 [00:00<00:25, 193.66it/s]2025-12-09:15:09:19 WARNING [evaluator:309] Overwriting default num_fewshot of hellaswag from None to 10 |
| 2025-12-09:15:09:19 INFO [api.task:434] Building contexts for hellaswag on rank 1... |
|
2%|โ | 119/5021 [00:00<00:25, 193.83it/s]
3%|โ | 139/5021 [00:00<00:25, 194.59it/s]
3%|โ | 159/5021 [00:00<00:24, 195.07it/s]
4%|โ | 179/5021 [00:00<00:24, 195.99it/s]
4%|โ | 199/5021 [00:01<00:24, 196.54it/s]
4%|โ | 219/5021 [00:01<00:24, 196.87it/s]
5%|โ | 239/5021 [00:01<00:24, 196.87it/s]
5%|โ | 259/5021 [00:01<00:24, 194.61it/s]
6%|โ | 279/5021 [00:01<00:24, 195.46it/s]
6%|โ | 299/5021 [00:01<00:24, 193.82it/s]
6%|โ | 319/5021 [00:01<00:24, 194.54it/s]
7%|โ | 339/5021 [00:01<00:24, 194.56it/s]
0%| | 0/5021 [00:00<?, ?it/s]
7%|โ | 359/5021 [00:01<00:24, 194.23it/s]
0%| | 19/5021 [00:00<00:27, 182.59it/s]
8%|โ | 379/5021 [00:01<00:24, 191.86it/s]
1%| | 39/5021 [00:00<00:26, 187.33it/s]
8%|โ | 399/5021 [00:02<00:24, 190.78it/s]
1%| | 59/5021 [00:00<00:25, 191.81it/s]
8%|โ | 419/5021 [00:02<00:24, 189.43it/s]
2%|โ | 79/5021 [00:00<00:25, 194.34it/s]
9%|โ | 438/5021 [00:02<00:24, 187.26it/s]
2%|โ | 99/5021 [00:00<00:25, 195.05it/s]
9%|โ | 457/5021 [00:02<00:24, 187.17it/s]
2%|โ | 119/5021 [00:00<00:25, 192.54it/s]
9%|โ | 476/5021 [00:02<00:24, 187.43it/s]
3%|โ | 139/5021 [00:00<00:25, 194.73it/s]
10%|โ | 495/5021 [00:02<00:24, 187.21it/s]
3%|โ | 159/5021 [00:00<00:24, 196.36it/s]
10%|โ | 514/5021 [00:02<00:24, 187.32it/s]
4%|โ | 179/5021 [00:00<00:24, 197.22it/s]
11%|โ | 533/5021 [00:02<00:23, 187.54it/s]
4%|โ | 200/5021 [00:01<00:24, 198.14it/s]
11%|โ | 552/5021 [00:02<00:23, 187.35it/s]
4%|โ | 220/5021 [00:01<00:24, 196.28it/s]
11%|โโ | 571/5021 [00:02<00:23, 187.46it/s]
5%|โ | 240/5021 [00:01<00:24, 196.36it/s]
12%|โโ | 590/5021 [00:03<00:23, 187.35it/s]
5%|โ | 260/5021 [00:01<00:24, 196.68it/s]
12%|โโ | 609/5021 [00:03<00:23, 186.88it/s]
6%|โ | 280/5021 [00:01<00:24, 196.53it/s]
13%|โโ | 628/5021 [00:03<00:23, 186.89it/s]
6%|โ | 300/5021 [00:01<00:24, 194.93it/s]
13%|โโ | 647/5021 [00:03<00:23, 186.46it/s]
13%|โโ | 666/5021 [00:03<00:23, 184.76it/s]
6%|โ | 320/5021 [00:01<00:25, 184.76it/s]
14%|โโ | 685/5021 [00:03<00:23, 184.61it/s]
7%|โ | 340/5021 [00:01<00:25, 186.80it/s]
14%|โโ | 704/5021 [00:03<00:23, 184.89it/s]
7%|โ | 360/5021 [00:01<00:24, 187.81it/s]
14%|โโ | 723/5021 [00:03<00:23, 185.53it/s]
8%|โ | 380/5021 [00:01<00:24, 189.06it/s]
15%|โโ | 742/5021 [00:03<00:22, 186.15it/s]
8%|โ | 400/5021 [00:02<00:24, 190.25it/s]
15%|โโ | 761/5021 [00:04<00:23, 181.50it/s]
8%|โ | 420/5021 [00:02<00:24, 191.50it/s]
9%|โ | 440/5021 [00:02<00:24, 189.94it/s]
16%|โโ | 780/5021 [00:04<00:24, 175.39it/s]
16%|โโ | 798/5021 [00:04<00:24, 173.55it/s]
9%|โ | 460/5021 [00:02<00:24, 185.73it/s]
16%|โโ | 816/5021 [00:04<00:24, 172.04it/s]
10%|โ | 479/5021 [00:02<00:24, 182.00it/s]
17%|โโ | 835/5021 [00:04<00:23, 176.71it/s]
10%|โ | 499/5021 [00:02<00:24, 184.36it/s]
17%|โโ | 854/5021 [00:04<00:23, 180.20it/s]
10%|โ | 519/5021 [00:02<00:24, 187.04it/s]
17%|โโ | 874/5021 [00:04<00:22, 184.01it/s]
11%|โ | 538/5021 [00:02<00:24, 186.68it/s]
18%|โโ | 894/5021 [00:04<00:22, 186.86it/s]
11%|โ | 558/5021 [00:02<00:23, 188.76it/s]
18%|โโ | 914/5021 [00:04<00:21, 188.55it/s]
12%|โโ | 578/5021 [00:03<00:23, 190.74it/s]
19%|โโ | 934/5021 [00:04<00:21, 189.31it/s]
12%|โโ | 598/5021 [00:03<00:23, 191.82it/s]
19%|โโ | 953/5021 [00:05<00:21, 189.43it/s]
12%|โโ | 618/5021 [00:03<00:23, 191.28it/s]
19%|โโ | 973/5021 [00:05<00:21, 189.70it/s]
13%|โโ | 638/5021 [00:03<00:23, 190.34it/s]
20%|โโ | 993/5021 [00:05<00:21, 190.28it/s]
13%|โโ | 658/5021 [00:03<00:22, 191.47it/s]
20%|โโ | 1013/5021 [00:05<00:20, 190.96it/s]
14%|โโ | 678/5021 [00:03<00:22, 192.39it/s]
21%|โโ | 1033/5021 [00:05<00:20, 190.74it/s]
14%|โโ | 698/5021 [00:03<00:22, 192.55it/s]
21%|โโ | 1053/5021 [00:05<00:20, 189.51it/s]
14%|โโ | 718/5021 [00:03<00:22, 192.82it/s]
21%|โโโ | 1073/5021 [00:05<00:20, 190.22it/s]
15%|โโ | 738/5021 [00:03<00:22, 193.65it/s]
22%|โโโ | 1093/5021 [00:05<00:20, 190.67it/s]
15%|โโ | 758/5021 [00:03<00:22, 193.68it/s]
22%|โโโ | 1113/5021 [00:05<00:20, 190.98it/s]
15%|โโ | 778/5021 [00:04<00:22, 191.18it/s]
23%|โโโ | 1133/5021 [00:06<00:20, 191.14it/s]
16%|โโ | 798/5021 [00:04<00:22, 191.07it/s]
23%|โโโ | 1153/5021 [00:06<00:20, 191.94it/s]
16%|โโ | 818/5021 [00:04<00:21, 191.94it/s]
23%|โโโ | 1173/5021 [00:06<00:20, 189.59it/s]
17%|โโ | 838/5021 [00:04<00:21, 191.80it/s]
24%|โโโ | 1192/5021 [00:06<00:20, 189.26it/s]
17%|โโ | 858/5021 [00:04<00:21, 191.28it/s]
24%|โโโ | 1212/5021 [00:06<00:20, 190.01it/s]
17%|โโ | 878/5021 [00:04<00:21, 192.00it/s]
25%|โโโ | 1232/5021 [00:06<00:19, 189.88it/s]
18%|โโ | 898/5021 [00:04<00:21, 192.56it/s]
25%|โโโ | 1252/5021 [00:06<00:19, 190.61it/s]
18%|โโ | 918/5021 [00:04<00:21, 191.46it/s]
25%|โโโ | 1272/5021 [00:06<00:19, 190.71it/s]
19%|โโ | 938/5021 [00:04<00:21, 191.58it/s]
26%|โโโ | 1292/5021 [00:06<00:19, 191.99it/s]
19%|โโ | 958/5021 [00:05<00:21, 192.22it/s]
26%|โโโ | 1312/5021 [00:06<00:19, 191.51it/s]
19%|โโ | 978/5021 [00:05<00:21, 192.50it/s]
27%|โโโ | 1332/5021 [00:07<00:19, 190.82it/s]
20%|โโ | 998/5021 [00:05<00:20, 193.34it/s]
27%|โโโ | 1352/5021 [00:07<00:19, 192.26it/s]
20%|โโ | 1018/5021 [00:05<00:20, 194.19it/s]
27%|โโโ | 1372/5021 [00:07<00:18, 192.12it/s]
21%|โโ | 1038/5021 [00:05<00:20, 192.30it/s]
28%|โโโ | 1392/5021 [00:07<00:18, 192.83it/s]
21%|โโ | 1058/5021 [00:05<00:20, 192.53it/s]
28%|โโโ | 1412/5021 [00:07<00:18, 192.75it/s]
21%|โโโ | 1078/5021 [00:05<00:20, 193.42it/s]
29%|โโโ | 1432/5021 [00:07<00:18, 192.99it/s]
22%|โโโ | 1098/5021 [00:05<00:20, 193.67it/s]
29%|โโโ | 1452/5021 [00:07<00:18, 193.12it/s]
22%|โโโ | 1118/5021 [00:05<00:20, 193.88it/s]
29%|โโโ | 1472/5021 [00:07<00:18, 193.17it/s]
23%|โโโ | 1138/5021 [00:05<00:19, 194.84it/s]
30%|โโโ | 1492/5021 [00:07<00:18, 193.03it/s]
23%|โโโ | 1158/5021 [00:06<00:19, 194.01it/s]
30%|โโโ | 1512/5021 [00:07<00:18, 193.29it/s]
23%|โโโ | 1178/5021 [00:06<00:19, 194.23it/s]
31%|โโโ | 1532/5021 [00:08<00:18, 193.65it/s]
24%|โโโ | 1198/5021 [00:06<00:19, 193.73it/s]
31%|โโโ | 1552/5021 [00:08<00:17, 193.57it/s]
24%|โโโ | 1218/5021 [00:06<00:19, 192.58it/s]
31%|โโโโ | 1572/5021 [00:08<00:17, 193.21it/s]
25%|โโโ | 1238/5021 [00:06<00:19, 193.16it/s]
25%|โโโ | 1258/5021 [00:06<00:19, 193.55it/s]
32%|โโโโ | 1592/5021 [00:08<00:17, 190.68it/s]
25%|โโโ | 1278/5021 [00:06<00:19, 193.20it/s]
32%|โโโโ | 1612/5021 [00:08<00:18, 187.75it/s]
32%|โโโโ | 1631/5021 [00:08<00:18, 187.14it/s]
26%|โโโ | 1298/5021 [00:06<00:20, 184.08it/s]
33%|โโโโ | 1651/5021 [00:08<00:17, 188.79it/s]
26%|โโโ | 1318/5021 [00:06<00:19, 187.89it/s]
33%|โโโโ | 1671/5021 [00:08<00:17, 189.78it/s]
27%|โโโ | 1338/5021 [00:06<00:19, 189.80it/s]
34%|โโโโ | 1691/5021 [00:08<00:17, 191.07it/s]
27%|โโโ | 1358/5021 [00:07<00:19, 190.85it/s]
34%|โโโโ | 1711/5021 [00:09<00:17, 191.42it/s]
27%|โโโ | 1378/5021 [00:07<00:18, 192.44it/s]
34%|โโโโ | 1731/5021 [00:09<00:17, 191.77it/s]
28%|โโโ | 1398/5021 [00:07<00:18, 191.28it/s]
35%|โโโโ | 1751/5021 [00:09<00:16, 192.60it/s]
28%|โโโ | 1418/5021 [00:07<00:18, 191.88it/s]
35%|โโโโ | 1771/5021 [00:09<00:16, 193.45it/s]
29%|โโโ | 1438/5021 [00:07<00:18, 192.47it/s]
36%|โโโโ | 1791/5021 [00:09<00:16, 194.29it/s]
29%|โโโ | 1458/5021 [00:07<00:18, 193.12it/s]
36%|โโโโ | 1811/5021 [00:09<00:16, 194.56it/s]
29%|โโโ | 1478/5021 [00:07<00:18, 194.70it/s]
36%|โโโโ | 1831/5021 [00:09<00:16, 195.21it/s]
30%|โโโ | 1498/5021 [00:07<00:18, 194.95it/s]
37%|โโโโ | 1851/5021 [00:09<00:16, 195.84it/s]
30%|โโโ | 1518/5021 [00:07<00:17, 195.29it/s]
37%|โโโโ | 1871/5021 [00:09<00:16, 195.93it/s]
31%|โโโ | 1538/5021 [00:08<00:17, 195.51it/s]
38%|โโโโ | 1891/5021 [00:09<00:15, 195.80it/s]
31%|โโโ | 1558/5021 [00:08<00:17, 195.97it/s]
38%|โโโโ | 1911/5021 [00:10<00:15, 195.75it/s]
31%|โโโโ | 1578/5021 [00:08<00:17, 193.81it/s]
38%|โโโโ | 1931/5021 [00:10<00:15, 196.19it/s]
32%|โโโโ | 1598/5021 [00:08<00:17, 194.46it/s]
39%|โโโโ | 1951/5021 [00:10<00:15, 196.40it/s]
32%|โโโโ | 1618/5021 [00:08<00:17, 195.56it/s]
39%|โโโโ | 1971/5021 [00:10<00:15, 196.35it/s]
33%|โโโโ | 1638/5021 [00:08<00:17, 196.08it/s]
40%|โโโโ | 1991/5021 [00:10<00:15, 196.46it/s]
33%|โโโโ | 1658/5021 [00:08<00:17, 196.34it/s]
40%|โโโโ | 2011/5021 [00:10<00:15, 196.05it/s]
33%|โโโโ | 1678/5021 [00:08<00:17, 195.63it/s]
40%|โโโโ | 2031/5021 [00:10<00:15, 195.47it/s]
34%|โโโโ | 1698/5021 [00:08<00:16, 195.87it/s]
41%|โโโโ | 2051/5021 [00:10<00:15, 195.15it/s]
34%|โโโโ | 1718/5021 [00:08<00:16, 195.57it/s]
41%|โโโโ | 2071/5021 [00:10<00:15, 194.92it/s]
35%|โโโโ | 1738/5021 [00:09<00:16, 195.18it/s]
42%|โโโโโ | 2091/5021 [00:10<00:15, 194.17it/s]
35%|โโโโ | 1758/5021 [00:09<00:16, 194.85it/s]
42%|โโโโโ | 2111/5021 [00:11<00:15, 192.86it/s]
35%|โโโโ | 1778/5021 [00:09<00:16, 193.36it/s]
42%|โโโโโ | 2131/5021 [00:11<00:14, 193.21it/s]
36%|โโโโ | 1798/5021 [00:09<00:16, 194.05it/s]
43%|โโโโโ | 2151/5021 [00:11<00:14, 191.79it/s]
36%|โโโโ | 1818/5021 [00:09<00:16, 193.98it/s]
43%|โโโโโ | 2171/5021 [00:11<00:14, 193.64it/s]
37%|โโโโ | 1838/5021 [00:09<00:16, 194.50it/s]
44%|โโโโโ | 2191/5021 [00:11<00:14, 194.44it/s]
37%|โโโโ | 1858/5021 [00:09<00:16, 195.13it/s]
44%|โโโโโ | 2211/5021 [00:11<00:14, 195.44it/s]
37%|โโโโ | 1878/5021 [00:09<00:16, 196.07it/s]
44%|โโโโโ | 2231/5021 [00:11<00:14, 196.14it/s]
38%|โโโโ | 1898/5021 [00:09<00:15, 196.72it/s]
45%|โโโโโ | 2251/5021 [00:11<00:14, 196.29it/s]
38%|โโโโ | 1918/5021 [00:09<00:15, 197.29it/s]
45%|โโโโโ | 2271/5021 [00:11<00:14, 196.34it/s]
39%|โโโโ | 1938/5021 [00:10<00:15, 197.10it/s]
46%|โโโโโ | 2291/5021 [00:11<00:13, 196.53it/s]
39%|โโโโ | 1958/5021 [00:10<00:15, 197.21it/s]
46%|โโโโโ | 2311/5021 [00:12<00:13, 196.60it/s]
39%|โโโโ | 1978/5021 [00:10<00:15, 196.99it/s]
46%|โโโโโ | 2331/5021 [00:12<00:13, 195.02it/s]
40%|โโโโ | 1998/5021 [00:10<00:15, 195.00it/s]
47%|โโโโโ | 2351/5021 [00:12<00:13, 195.52it/s]
40%|โโโโ | 2018/5021 [00:10<00:15, 195.02it/s]
47%|โโโโโ | 2371/5021 [00:12<00:13, 195.51it/s]
41%|โโโโ | 2038/5021 [00:10<00:15, 194.55it/s]
48%|โโโโโ | 2391/5021 [00:12<00:13, 195.50it/s]
41%|โโโโ | 2058/5021 [00:10<00:15, 194.49it/s]
48%|โโโโโ | 2411/5021 [00:12<00:13, 195.73it/s]
41%|โโโโโ | 2078/5021 [00:10<00:15, 194.27it/s]
48%|โโโโโ | 2431/5021 [00:12<00:13, 194.37it/s]
42%|โโโโโ | 2098/5021 [00:10<00:15, 193.72it/s]
49%|โโโโโ | 2451/5021 [00:12<00:13, 194.12it/s]
42%|โโโโโ | 2118/5021 [00:10<00:15, 193.38it/s]
49%|โโโโโ | 2471/5021 [00:12<00:13, 192.51it/s]
43%|โโโโโ | 2138/5021 [00:11<00:14, 193.48it/s]
50%|โโโโโ | 2491/5021 [00:13<00:13, 192.50it/s]
43%|โโโโโ | 2158/5021 [00:11<00:14, 193.94it/s]
50%|โโโโโ | 2511/5021 [00:13<00:13, 192.58it/s]
43%|โโโโโ | 2178/5021 [00:11<00:14, 194.48it/s]
50%|โโโโโ | 2531/5021 [00:13<00:13, 191.39it/s]
44%|โโโโโ | 2198/5021 [00:11<00:14, 192.60it/s]
51%|โโโโโ | 2551/5021 [00:13<00:12, 191.85it/s]
44%|โโโโโ | 2218/5021 [00:11<00:14, 193.49it/s]
51%|โโโโโ | 2571/5021 [00:13<00:12, 192.22it/s]
45%|โโโโโ | 2238/5021 [00:11<00:14, 193.28it/s]
45%|โโโโโ | 2258/5021 [00:11<00:14, 193.60it/s]
52%|โโโโโโ | 2591/5021 [00:13<00:12, 191.56it/s]
45%|โโโโโ | 2278/5021 [00:11<00:14, 194.28it/s]
52%|โโโโโโ | 2611/5021 [00:13<00:12, 192.80it/s]
46%|โโโโโ | 2298/5021 [00:11<00:13, 194.71it/s]
52%|โโโโโโ | 2631/5021 [00:13<00:12, 193.38it/s]
46%|โโโโโ | 2318/5021 [00:12<00:13, 194.95it/s]
53%|โโโโโโ | 2651/5021 [00:13<00:12, 193.67it/s]
47%|โโโโโ | 2338/5021 [00:12<00:13, 194.88it/s]
53%|โโโโโโ | 2671/5021 [00:13<00:12, 193.87it/s]
47%|โโโโโ | 2358/5021 [00:12<00:13, 194.99it/s]
54%|โโโโโโ | 2691/5021 [00:14<00:12, 193.63it/s]
47%|โโโโโ | 2378/5021 [00:12<00:13, 194.24it/s]
54%|โโโโโโ | 2711/5021 [00:14<00:11, 192.74it/s]
48%|โโโโโ | 2398/5021 [00:12<00:13, 193.73it/s]
54%|โโโโโโ | 2731/5021 [00:14<00:11, 191.67it/s]
48%|โโโโโ | 2418/5021 [00:12<00:13, 193.75it/s]
55%|โโโโโโ | 2751/5021 [00:14<00:11, 191.40it/s]
49%|โโโโโ | 2438/5021 [00:12<00:13, 192.39it/s]
55%|โโโโโโ | 2771/5021 [00:14<00:11, 190.83it/s]
49%|โโโโโ | 2458/5021 [00:12<00:13, 193.10it/s]
56%|โโโโโโ | 2791/5021 [00:14<00:11, 190.99it/s]
49%|โโโโโ | 2478/5021 [00:12<00:13, 193.83it/s]
56%|โโโโโโ | 2811/5021 [00:14<00:11, 192.18it/s]
50%|โโโโโ | 2498/5021 [00:12<00:12, 194.41it/s]
56%|โโโโโโ | 2831/5021 [00:14<00:11, 193.15it/s]
50%|โโโโโ | 2518/5021 [00:13<00:12, 194.91it/s]
57%|โโโโโโ | 2851/5021 [00:14<00:11, 193.90it/s]
51%|โโโโโ | 2538/5021 [00:13<00:12, 195.46it/s]
57%|โโโโโโ | 2871/5021 [00:14<00:11, 194.51it/s]
51%|โโโโโ | 2558/5021 [00:13<00:12, 195.73it/s]
58%|โโโโโโ | 2891/5021 [00:15<00:10, 194.89it/s]
51%|โโโโโโ | 2578/5021 [00:13<00:12, 196.08it/s]
58%|โโโโโโ | 2911/5021 [00:15<00:10, 195.25it/s]
52%|โโโโโโ | 2598/5021 [00:13<00:12, 196.22it/s]
58%|โโโโโโ | 2931/5021 [00:15<00:10, 195.85it/s]
52%|โโโโโโ | 2618/5021 [00:13<00:12, 196.30it/s]
59%|โโโโโโ | 2951/5021 [00:15<00:10, 195.38it/s]
53%|โโโโโโ | 2638/5021 [00:13<00:12, 196.87it/s]
59%|โโโโโโ | 2971/5021 [00:15<00:10, 195.68it/s]
53%|โโโโโโ | 2658/5021 [00:13<00:11, 197.09it/s]
60%|โโโโโโ | 2991/5021 [00:15<00:10, 195.61it/s]
53%|โโโโโโ | 2678/5021 [00:13<00:11, 197.66it/s]
60%|โโโโโโ | 3011/5021 [00:15<00:10, 195.51it/s]
54%|โโโโโโ | 2698/5021 [00:13<00:11, 195.53it/s]
60%|โโโโโโ | 3031/5021 [00:15<00:10, 195.48it/s]
54%|โโโโโโ | 2718/5021 [00:14<00:11, 194.76it/s]
61%|โโโโโโ | 3051/5021 [00:15<00:10, 195.19it/s]
55%|โโโโโโ | 2738/5021 [00:14<00:11, 194.33it/s]
61%|โโโโโโ | 3071/5021 [00:16<00:09, 195.11it/s]
55%|โโโโโโ | 2758/5021 [00:14<00:11, 194.22it/s]
62%|โโโโโโโ | 3091/5021 [00:16<00:09, 194.84it/s]
55%|โโโโโโ | 2778/5021 [00:14<00:11, 194.15it/s]
62%|โโโโโโโ | 3111/5021 [00:16<00:09, 194.98it/s]
56%|โโโโโโ | 2798/5021 [00:14<00:11, 194.04it/s]
62%|โโโโโโโ | 3131/5021 [00:16<00:09, 194.63it/s]
56%|โโโโโโ | 2818/5021 [00:14<00:11, 193.99it/s]
63%|โโโโโโโ | 3151/5021 [00:16<00:09, 194.10it/s]
57%|โโโโโโ | 2838/5021 [00:14<00:11, 194.39it/s]
63%|โโโโโโโ | 3171/5021 [00:16<00:09, 194.00it/s]
57%|โโโโโโ | 2858/5021 [00:14<00:11, 194.48it/s]
64%|โโโโโโโ | 3191/5021 [00:16<00:09, 194.02it/s]
57%|โโโโโโ | 2878/5021 [00:14<00:11, 194.62it/s]
64%|โโโโโโโ | 3211/5021 [00:16<00:09, 194.12it/s]
58%|โโโโโโ | 2898/5021 [00:14<00:10, 194.69it/s]
64%|โโโโโโโ | 3231/5021 [00:16<00:09, 195.04it/s]
58%|โโโโโโ | 2918/5021 [00:15<00:10, 194.82it/s]
65%|โโโโโโโ | 3251/5021 [00:16<00:09, 195.30it/s]
59%|โโโโโโ | 2938/5021 [00:15<00:10, 192.78it/s]
65%|โโโโโโโ | 3271/5021 [00:17<00:08, 195.80it/s]
59%|โโโโโโ | 2958/5021 [00:15<00:10, 193.60it/s]
66%|โโโโโโโ | 3291/5021 [00:17<00:08, 196.13it/s]
59%|โโโโโโ | 2978/5021 [00:15<00:10, 194.12it/s]
66%|โโโโโโโ | 3311/5021 [00:17<00:08, 195.99it/s]
60%|โโโโโโ | 2998/5021 [00:15<00:10, 195.06it/s]
66%|โโโโโโโ | 3331/5021 [00:17<00:08, 196.17it/s]
60%|โโโโโโ | 3018/5021 [00:15<00:10, 195.99it/s]
67%|โโโโโโโ | 3351/5021 [00:17<00:08, 196.42it/s]
61%|โโโโโโ | 3038/5021 [00:15<00:10, 191.40it/s]
67%|โโโโโโโ | 3371/5021 [00:17<00:08, 191.52it/s]
61%|โโโโโโ | 3058/5021 [00:15<00:10, 192.46it/s]
68%|โโโโโโโ | 3391/5021 [00:17<00:08, 192.61it/s]
61%|โโโโโโโ | 3078/5021 [00:15<00:10, 193.12it/s]
68%|โโโโโโโ | 3411/5021 [00:17<00:08, 193.64it/s]
62%|โโโโโโโ | 3098/5021 [00:16<00:09, 193.99it/s]
68%|โโโโโโโ | 3431/5021 [00:17<00:08, 193.74it/s]
62%|โโโโโโโ | 3118/5021 [00:16<00:09, 195.16it/s]
69%|โโโโโโโ | 3451/5021 [00:17<00:08, 193.99it/s]
62%|โโโโโโโ | 3138/5021 [00:16<00:09, 195.98it/s]
69%|โโโโโโโ | 3471/5021 [00:18<00:07, 194.81it/s]
63%|โโโโโโโ | 3158/5021 [00:16<00:09, 196.38it/s]
70%|โโโโโโโ | 3491/5021 [00:18<00:07, 194.77it/s]
63%|โโโโโโโ | 3178/5021 [00:16<00:09, 196.84it/s]
70%|โโโโโโโ | 3511/5021 [00:18<00:07, 194.71it/s]
64%|โโโโโโโ | 3198/5021 [00:16<00:09, 196.24it/s]
70%|โโโโโโโ | 3531/5021 [00:18<00:07, 194.80it/s]
64%|โโโโโโโ | 3218/5021 [00:16<00:09, 196.25it/s]
71%|โโโโโโโ | 3551/5021 [00:18<00:07, 194.49it/s]
64%|โโโโโโโ | 3238/5021 [00:16<00:09, 194.90it/s]
71%|โโโโโโโ | 3571/5021 [00:18<00:07, 194.73it/s]
65%|โโโโโโโ | 3258/5021 [00:16<00:09, 195.23it/s]
72%|โโโโโโโโ | 3591/5021 [00:18<00:07, 189.59it/s]
65%|โโโโโโโ | 3278/5021 [00:16<00:09, 189.33it/s]
72%|โโโโโโโโ | 3610/5021 [00:18<00:07, 188.67it/s]
66%|โโโโโโโ | 3298/5021 [00:17<00:09, 190.71it/s]
72%|โโโโโโโโ | 3629/5021 [00:18<00:07, 187.08it/s]
66%|โโโโโโโ | 3318/5021 [00:17<00:08, 191.93it/s]
73%|โโโโโโโโ | 3649/5021 [00:19<00:07, 189.71it/s]
66%|โโโโโโโ | 3338/5021 [00:17<00:08, 193.70it/s]
73%|โโโโโโโโ | 3669/5021 [00:19<00:07, 191.23it/s]
67%|โโโโโโโ | 3358/5021 [00:17<00:08, 194.50it/s]
73%|โโโโโโโโ | 3689/5021 [00:19<00:06, 192.70it/s]
67%|โโโโโโโ | 3378/5021 [00:17<00:08, 195.27it/s]
74%|โโโโโโโโ | 3709/5021 [00:19<00:06, 193.37it/s]
68%|โโโโโโโ | 3398/5021 [00:17<00:08, 195.73it/s]
74%|โโโโโโโโ | 3729/5021 [00:19<00:06, 193.96it/s]
68%|โโโโโโโ | 3418/5021 [00:17<00:08, 196.24it/s]
75%|โโโโโโโโ | 3749/5021 [00:19<00:06, 194.20it/s]
68%|โโโโโโโ | 3438/5021 [00:17<00:08, 196.60it/s]
75%|โโโโโโโโ | 3769/5021 [00:19<00:06, 194.09it/s]
69%|โโโโโโโ | 3458/5021 [00:17<00:07, 196.94it/s]
75%|โโโโโโโโ | 3789/5021 [00:19<00:06, 194.76it/s]
69%|โโโโโโโ | 3478/5021 [00:17<00:07, 197.23it/s]
76%|โโโโโโโโ | 3809/5021 [00:19<00:06, 195.22it/s]
70%|โโโโโโโ | 3498/5021 [00:18<00:07, 197.74it/s]
76%|โโโโโโโโ | 3829/5021 [00:19<00:06, 195.37it/s]
70%|โโโโโโโ | 3518/5021 [00:18<00:07, 198.04it/s]
77%|โโโโโโโโ | 3849/5021 [00:20<00:05, 195.99it/s]
70%|โโโโโโโ | 3538/5021 [00:18<00:07, 196.74it/s]
77%|โโโโโโโโ | 3869/5021 [00:20<00:05, 195.83it/s]
71%|โโโโโโโ | 3558/5021 [00:18<00:07, 196.25it/s]
77%|โโโโโโโโ | 3889/5021 [00:20<00:05, 195.10it/s]
71%|โโโโโโโโ | 3578/5021 [00:18<00:07, 195.60it/s]
78%|โโโโโโโโ | 3909/5021 [00:20<00:05, 194.69it/s]
72%|โโโโโโโโ | 3598/5021 [00:18<00:07, 194.72it/s]
78%|โโโโโโโโ | 3929/5021 [00:20<00:05, 193.26it/s]
72%|โโโโโโโโ | 3618/5021 [00:18<00:07, 194.60it/s]
79%|โโโโโโโโ | 3949/5021 [00:20<00:05, 190.34it/s]
72%|โโโโโโโโ | 3638/5021 [00:18<00:07, 194.07it/s]
79%|โโโโโโโโ | 3969/5021 [00:20<00:05, 189.61it/s]
73%|โโโโโโโโ | 3658/5021 [00:18<00:07, 194.19it/s]
79%|โโโโโโโโ | 3988/5021 [00:20<00:05, 189.27it/s]
73%|โโโโโโโโ | 3678/5021 [00:18<00:06, 193.64it/s]
80%|โโโโโโโโ | 4007/5021 [00:20<00:05, 189.26it/s]
74%|โโโโโโโโ | 3698/5021 [00:19<00:06, 192.89it/s]
80%|โโโโโโโโ | 4026/5021 [00:20<00:05, 189.29it/s]
74%|โโโโโโโโ | 3718/5021 [00:19<00:06, 191.37it/s]
81%|โโโโโโโโ | 4045/5021 [00:21<00:05, 189.00it/s]
74%|โโโโโโโโ | 3738/5021 [00:19<00:06, 190.15it/s]
81%|โโโโโโโโ | 4064/5021 [00:21<00:05, 189.24it/s]
75%|โโโโโโโโ | 3758/5021 [00:19<00:06, 189.26it/s]
81%|โโโโโโโโโ | 4084/5021 [00:21<00:04, 189.65it/s]
75%|โโโโโโโโ | 3777/5021 [00:19<00:06, 189.02it/s]
82%|โโโโโโโโโ | 4104/5021 [00:21<00:04, 190.50it/s]
76%|โโโโโโโโ | 3796/5021 [00:19<00:06, 189.16it/s]
82%|โโโโโโโโโ | 4124/5021 [00:21<00:04, 191.42it/s]
76%|โโโโโโโโ | 3816/5021 [00:19<00:06, 190.32it/s]
83%|โโโโโโโโโ | 4144/5021 [00:21<00:04, 191.61it/s]
76%|โโโโโโโโ | 3836/5021 [00:19<00:06, 189.21it/s]
83%|โโโโโโโโโ | 4164/5021 [00:21<00:04, 191.26it/s]
77%|โโโโโโโโ | 3856/5021 [00:19<00:06, 190.55it/s]
83%|โโโโโโโโโ | 4184/5021 [00:21<00:04, 190.52it/s]
77%|โโโโโโโโ | 3876/5021 [00:20<00:05, 191.66it/s]
84%|โโโโโโโโโ | 4204/5021 [00:21<00:04, 190.84it/s]
78%|โโโโโโโโ | 3896/5021 [00:20<00:05, 192.14it/s]
84%|โโโโโโโโโ | 4224/5021 [00:21<00:04, 190.44it/s]
78%|โโโโโโโโ | 3916/5021 [00:20<00:05, 193.06it/s]
85%|โโโโโโโโโ | 4244/5021 [00:22<00:04, 190.13it/s]
78%|โโโโโโโโ | 3936/5021 [00:20<00:05, 193.84it/s]
85%|โโโโโโโโโ | 4264/5021 [00:22<00:04, 189.23it/s]
79%|โโโโโโโโ | 3956/5021 [00:20<00:05, 194.63it/s]
85%|โโโโโโโโโ | 4283/5021 [00:22<00:03, 189.27it/s]
79%|โโโโโโโโ | 3976/5021 [00:20<00:05, 190.77it/s]
86%|โโโโโโโโโ | 4302/5021 [00:22<00:03, 188.10it/s]
80%|โโโโโโโโ | 3996/5021 [00:20<00:05, 187.78it/s]
86%|โโโโโโโโโ | 4321/5021 [00:22<00:03, 187.61it/s]
80%|โโโโโโโโ | 4015/5021 [00:20<00:05, 185.75it/s]
86%|โโโโโโโโโ | 4340/5021 [00:22<00:03, 188.19it/s]
80%|โโโโโโโโ | 4035/5021 [00:20<00:05, 187.82it/s]
87%|โโโโโโโโโ | 4359/5021 [00:22<00:03, 188.68it/s]
81%|โโโโโโโโ | 4054/5021 [00:20<00:05, 187.19it/s]
87%|โโโโโโโโโ | 4379/5021 [00:22<00:03, 189.35it/s]
81%|โโโโโโโโ | 4073/5021 [00:21<00:05, 185.95it/s]
88%|โโโโโโโโโ | 4399/5021 [00:22<00:03, 190.01it/s]
81%|โโโโโโโโโ | 4092/5021 [00:21<00:05, 185.63it/s]
88%|โโโโโโโโโ | 4419/5021 [00:23<00:03, 189.91it/s]
82%|โโโโโโโโโ | 4111/5021 [00:21<00:04, 186.47it/s]
88%|โโโโโโโโโ | 4439/5021 [00:23<00:03, 190.38it/s]
82%|โโโโโโโโโ | 4130/5021 [00:21<00:04, 185.68it/s]
89%|โโโโโโโโโ | 4459/5021 [00:23<00:03, 182.55it/s]
83%|โโโโโโโโโ | 4149/5021 [00:21<00:04, 179.46it/s]
89%|โโโโโโโโโ | 4479/5021 [00:23<00:02, 185.70it/s]
83%|โโโโโโโโโ | 4169/5021 [00:21<00:04, 184.62it/s]
90%|โโโโโโโโโ | 4499/5021 [00:23<00:02, 188.36it/s]
83%|โโโโโโโโโ | 4188/5021 [00:21<00:04, 185.47it/s]
90%|โโโโโโโโโ | 4519/5021 [00:23<00:02, 189.85it/s]
84%|โโโโโโโโโ | 4208/5021 [00:21<00:04, 187.82it/s]
90%|โโโโโโโโโ | 4539/5021 [00:23<00:02, 191.06it/s]
84%|โโโโโโโโโ | 4227/5021 [00:21<00:04, 187.11it/s]
91%|โโโโโโโโโ | 4559/5021 [00:23<00:02, 191.43it/s]
85%|โโโโโโโโโ | 4246/5021 [00:22<00:04, 187.47it/s]
91%|โโโโโโโโโ | 4579/5021 [00:23<00:02, 190.91it/s]
85%|โโโโโโโโโ | 4265/5021 [00:22<00:04, 187.35it/s]
92%|โโโโโโโโโโ| 4599/5021 [00:23<00:02, 190.40it/s]
85%|โโโโโโโโโ | 4284/5021 [00:22<00:03, 187.66it/s]
92%|โโโโโโโโโโ| 4619/5021 [00:24<00:02, 190.76it/s]
86%|โโโโโโโโโ | 4303/5021 [00:22<00:03, 187.60it/s]
92%|โโโโโโโโโโ| 4639/5021 [00:24<00:01, 191.90it/s]
86%|โโโโโโโโโ | 4322/5021 [00:22<00:03, 187.68it/s]
93%|โโโโโโโโโโ| 4659/5021 [00:24<00:01, 191.11it/s]
86%|โโโโโโโโโ | 4341/5021 [00:22<00:03, 186.24it/s]
93%|โโโโโโโโโโ| 4679/5021 [00:24<00:01, 190.83it/s]
87%|โโโโโโโโโ | 4360/5021 [00:22<00:03, 187.08it/s]
94%|โโโโโโโโโโ| 4699/5021 [00:24<00:01, 191.09it/s]
87%|โโโโโโโโโ | 4380/5021 [00:22<00:03, 188.33it/s]
94%|โโโโโโโโโโ| 4719/5021 [00:24<00:01, 192.10it/s]
88%|โโโโโโโโโ | 4400/5021 [00:22<00:03, 188.90it/s]
94%|โโโโโโโโโโ| 4739/5021 [00:24<00:01, 192.80it/s]
88%|โโโโโโโโโ | 4419/5021 [00:22<00:03, 182.46it/s]
95%|โโโโโโโโโโ| 4759/5021 [00:24<00:01, 191.67it/s]
88%|โโโโโโโโโ | 4438/5021 [00:23<00:03, 182.01it/s]
95%|โโโโโโโโโโ| 4779/5021 [00:24<00:01, 184.20it/s]
89%|โโโโโโโโโ | 4457/5021 [00:23<00:03, 182.07it/s]
96%|โโโโโโโโโโ| 4799/5021 [00:25<00:01, 186.12it/s]
89%|โโโโโโโโโ | 4476/5021 [00:23<00:02, 182.24it/s]
96%|โโโโโโโโโโ| 4819/5021 [00:25<00:01, 188.05it/s]
90%|โโโโโโโโโ | 4495/5021 [00:23<00:02, 181.15it/s]
96%|โโโโโโโโโโ| 4839/5021 [00:25<00:00, 189.35it/s]
90%|โโโโโโโโโ | 4514/5021 [00:23<00:02, 178.57it/s]
97%|โโโโโโโโโโ| 4859/5021 [00:25<00:00, 190.47it/s]
90%|โโโโโโโโโ | 4532/5021 [00:23<00:02, 177.28it/s]
97%|โโโโโโโโโโ| 4879/5021 [00:25<00:00, 191.50it/s]
91%|โโโโโโโโโ | 4550/5021 [00:23<00:02, 176.27it/s]
98%|โโโโโโโโโโ| 4899/5021 [00:25<00:00, 192.20it/s]
91%|โโโโโโโโโ | 4568/5021 [00:23<00:02, 176.10it/s]
98%|โโโโโโโโโโ| 4919/5021 [00:25<00:00, 192.60it/s]
91%|โโโโโโโโโโ| 4586/5021 [00:23<00:02, 175.91it/s]
98%|โโโโโโโโโโ| 4939/5021 [00:25<00:00, 192.78it/s]
92%|โโโโโโโโโโ| 4604/5021 [00:23<00:02, 176.05it/s]
99%|โโโโโโโโโโ| 4959/5021 [00:25<00:00, 193.20it/s]
92%|โโโโโโโโโโ| 4622/5021 [00:24<00:02, 176.48it/s]
99%|โโโโโโโโโโ| 4979/5021 [00:25<00:00, 193.56it/s]
92%|โโโโโโโโโโ| 4641/5021 [00:24<00:02, 178.04it/s]
100%|โโโโโโโโโโ| 4999/5021 [00:26<00:00, 192.61it/s]
93%|โโโโโโโโโโ| 4660/5021 [00:24<00:01, 181.23it/s]
100%|โโโโโโโโโโ| 5019/5021 [00:26<00:00, 193.04it/s]
100%|โโโโโโโโโโ| 5021/5021 [00:26<00:00, 191.72it/s] |
| n136-128-154:2198157:2198157 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2198157:2198157 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2198157:2198157 [0] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2198157:2198157 [0] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2198157:2198157 [0] NCCL INFO Comm config Blocking set to 1 |
|
93%|โโโโโโโโโโ| 4679/5021 [00:24<00:01, 183.68it/s]
94%|โโโโโโโโโโ| 4698/5021 [00:24<00:01, 184.86it/s]
94%|โโโโโโโโโโ| 4717/5021 [00:24<00:01, 186.26it/s]n136-128-154:2198157:2198515 [0] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2198157:2198515 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2198157:2198515 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2198157:2198515 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2198157:2198515 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2198157:2198515 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2198157:2198515 [0] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2198157:2198515 [0] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2198157:2198515 [0] NCCL INFO Using network IB |
|
94%|โโโโโโโโโโ| 4736/5021 [00:24<00:01, 186.32it/s]n136-128-154:2198157:2198515 [0] NCCL INFO ncclCommInitRankConfig comm 0x13b13560 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0x8dc6d77cdbe28b76 - Init START |
|
95%|โโโโโโโโโโ| 4755/5021 [00:24<00:01, 183.07it/s]
95%|โโโโโโโโโโ| 4774/5021 [00:24<00:01, 181.06it/s]
95%|โโโโโโโโโโ| 4793/5021 [00:25<00:01, 182.01it/s]
96%|โโโโโโโโโโ| 4812/5021 [00:25<00:01, 183.60it/s]
96%|โโโโโโโโโโ| 4832/5021 [00:25<00:01, 185.65it/s]
97%|โโโโโโโโโโ| 4851/5021 [00:25<00:00, 185.11it/s]
97%|โโโโโโโโโโ| 4871/5021 [00:25<00:00, 186.86it/s]
97%|โโโโโโโโโโ| 4891/5021 [00:25<00:00, 187.87it/s]
98%|โโโโโโโโโโ| 4910/5021 [00:25<00:00, 188.37it/s]
98%|โโโโโโโโโโ| 4929/5021 [00:25<00:00, 188.46it/s]
99%|โโโโโโโโโโ| 4948/5021 [00:25<00:00, 188.68it/s]
99%|โโโโโโโโโโ| 4968/5021 [00:25<00:00, 189.49it/s]
99%|โโโโโโโโโโ| 4987/5021 [00:26<00:00, 189.51it/s]
100%|โโโโโโโโโโ| 5006/5021 [00:26<00:00, 189.30it/s]
100%|โโโโโโโโโโ| 5021/5021 [00:26<00:00, 191.47it/s] |
| n136-128-154:2198158:2198158 [1] NCCL INFO cudaDriverVersion 12040 |
| n136-128-154:2198158:2198158 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2198158:2198158 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2198158:2198158 [1] NCCL INFO NCCL version 2.27.5+cuda12.9 |
| n136-128-154:2198158:2198158 [1] NCCL INFO Comm config Blocking set to 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO NET/Plugin: Could not find: none libnccl-net-none.so. |
| n136-128-154:2198158:2198521 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. |
| n136-128-154:2198158:2198521 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 |
| n136-128-154:2198158:2198521 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to =eth0 |
| n136-128-154:2198158:2198521 [1] NCCL INFO NCCL_IB_HCA set to mlx5 |
| n136-128-154:2198158:2198521 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. |
| n136-128-154:2198158:2198521 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:fdbd:dc03:9:451::154<0> |
| n136-128-154:2198158:2198521 [1] NCCL INFO Initialized NET plugin IB |
| n136-128-154:2198158:2198521 [1] NCCL INFO Assigned NET plugin IB to comm |
| n136-128-154:2198158:2198521 [1] NCCL INFO Using network IB |
| n136-128-154:2198158:2198521 [1] NCCL INFO ncclCommInitRankConfig comm 0xf821860 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0x8dc6d77cdbe28b76 - Init START |
| n136-128-154:2198158:2198521 [1] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2198157:2198515 [0] NCCL INFO RAS client listening socket at ::1<28028> |
| n136-128-154:2198157:2198515 [0] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2198157:2198515 [0] NCCL INFO Retrieving state for IB |
| n136-128-154:2198157:2198515 [0] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2198157:2198515 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2198158:2198521 [1] NCCL INFO TOPO/NET : Importing network plugins to topology |
| n136-128-154:2198158:2198521 [1] NCCL INFO Retrieving state for IB |
| n136-128-154:2198158:2198521 [1] NCCL INFO Initialized state 0 for IB |
| n136-128-154:2198157:2198515 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2198158:2198521 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_0 in topo with pciPath=/sys/devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:1b:00.0/0000:1c:00.0/0000:1d:00.0 keep=1 coll=(null) |
| n136-128-154:2198158:2198521 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_1 in topo with pciPath=/sys/devices/pci0000:43/0000:43:02.0/0000:44:00.0/0000:45:08.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0 keep=1 coll=(null) |
| n136-128-154:2198157:2198515 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2198157:2198515 [0] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2198158:2198521 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_2 in topo with pciPath=/sys/devices/pci0000:82/0000:82:02.0/0000:83:00.0/0000:84:08.0/0000:93:00.0/0000:94:00.0/0000:95:00.0 keep=1 coll=(null) |
| n136-128-154:2198158:2198521 [1] NCCL INFO ncclTopoPopulateNics : Filled mlx5_3 in topo with pciPath=/sys/devices/pci0000:be/0000:be:02.0/0000:bf:00.0/0000:c0:04.0/0000:ca:00.0/0000:cb:10.0/0000:cc:00.0 keep=1 coll=(null) |
| n136-128-154:2198158:2198521 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2198158:2198521 [1] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2198157:2198515 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2198157:2198515 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 1 / HCA 3 (distance 5 <= 5), read 0 mode Default |
| n136-128-154:2198157:2198515 [0] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2198157:2198515 [0] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2198157:2198515 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2198157:2198515 [0] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2198157:2198515 [0] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO ========================================== |
| n136-128-154:2198157:2198515 [0] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2198157:2198515 [0] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2198157:2198515 [0] NCCL INFO Setting affinity for GPU 6 to 33-62,97-126 |
| n136-128-154:2198158:2198521 [1] NCCL INFO === System : maxBw 240.0 totalBw 240.0 === |
| n136-128-154:2198158:2198521 [1] NCCL INFO CPU/0-1 (1/1/2) |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - PCI/0-83000 (1000c0101000ffff) |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - NIC/0-95000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - PCI/0-bf000 (1000c0101000ffff) |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - PCI/0-c3000 (1000c01010de13b8) |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - GPU/0-c5000 (0) |
| n136-128-154:2198158:2198521 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - PCI/0-c7000 (1000c01010de13b8) |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - GPU/0-c9000 (1) |
| n136-128-154:2198158:2198521 [1] NCCL INFO + NVL[240.0] - NVS/0-0 |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - NIC/0-cc000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO + SYS[10.0] - CPU/0-0 |
| n136-128-154:2198158:2198521 [1] NCCL INFO CPU/0-0 (1/1/2) |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - PCI/0-a000 (1000c0101000ffff) |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - NIC/0-1d000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - PCI/0-44000 (1000c0101000ffff) |
| n136-128-154:2198158:2198521 [1] NCCL INFO + PCI[24.0] - NIC/0-60000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO + SYS[10.0] - CPU/0-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO ========================================== |
| n136-128-154:2198158:2198521 [1] NCCL INFO GPU/0-c5000 :GPU/0-c5000 (0/5000.0/LOC) GPU/0-c9000 (2/240.0/NVL) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2198158:2198521 [1] NCCL INFO GPU/0-c9000 :GPU/0-c5000 (2/240.0/NVL) GPU/0-c9000 (0/5000.0/LOC) NVS/0-0 (1/240.0/NVL) CPU/0-1 (3/24.0/PHB) CPU/0-0 (4/10.0/SYS) |
| n136-128-154:2198158:2198521 [1] NCCL INFO Setting affinity for GPU 7 to 33-62,97-126 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 20.000000/20.000000, type NVL/PIX, sameChannels 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 6 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 7 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 8 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 9 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 10 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 11 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 40.000000/40.000000, type NVL/PIX, sameChannels 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 0 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 1 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 2 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 3 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 4 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 5 : GPU/0-c5000 GPU/0-c9000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 6 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 7 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 8 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 9 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 10 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 11 : GPU/0-c9000 GPU/0-c5000 |
| n136-128-154:2198157:2198515 [0] NCCL INFO comm 0x13b13560 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0 |
| n136-128-154:2198158:2198521 [1] NCCL INFO comm 0xf821860 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 0 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 12 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 0 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 1 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 12 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 13 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 1 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 2 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 13 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 14 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 2 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 3 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 14 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 15 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 3 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 4 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 15 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 16 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 4 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 5 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 16 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 17 : -1 -> 0 -> 1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 5 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 6 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 17 : 0 -> 1 -> -1/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 18 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 6 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 7 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 18 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 19 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 7 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 8 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 19 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 20 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 8 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 9 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 20 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 21 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 9 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 10 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 21 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 22 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 10 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 11 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 22 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Tree 23 : 1 -> 0 -> -1/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 11 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Tree 23 : -1 -> 1 -> 0/-1/-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 00/24 : 0 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 01/24 : 0 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 02/24 : 0 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 03/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 00 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 04/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 01 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 05/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 02 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 06/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 03 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 07/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 04 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 08/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 05 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 09/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 06 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 10/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 07 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 11/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 08 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 12/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 09 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 13/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 10 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 14/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 11 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 15/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 12 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 16/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 13 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 17/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 14 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 18/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 15 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 19/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 16 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 20/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 17 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 21/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 18 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 22/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 19 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Channel 23/24 : 0 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 20 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 00 : 1 -> 0 -> 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 21 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 01 : 1 -> 0 -> 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 22 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 02 : 1 -> 0 -> 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Ring 23 : 0 -> 1 -> 0 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 03 : 1 -> 0 -> 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] -1/-1/-1->1->0 [5] -1/-1/-1->1->0 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] 0/-1/-1->1->-1 [9] 0/-1/-1->1->-1 [10] 0/-1/-1->1->-1 [11] 0/-1/-1->1->-1 [12] -1/-1/-1->1->0 [13] -1/-1/-1->1->0 [14] -1/-1/-1->1->0 [15] -1/-1/-1->1->0 [16] -1/-1/-1->1->0 [17] -1/-1/-1->1->0 [18] 0/-1/-1->1->-1 [19] 0/-1/-1->1->-1 [20] 0/-1/-1->1->-1 [21] 0/-1/-1->1->-1 [22] 0/-1/-1->1->-1 [23] 0/-1/-1->1->-1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 04 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 05 : 1 -> 0 -> 1 |
| n136-128-154:2198158:2198521 [1] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 06 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 07 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 08 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 09 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 10 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 11 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 12 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 13 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 14 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 15 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 16 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 17 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 18 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 19 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 20 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 21 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 22 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Ring 23 : 1 -> 0 -> 1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] -1/-1/-1->0->1 [9] -1/-1/-1->0->1 [10] -1/-1/-1->0->1 [11] -1/-1/-1->0->1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] -1/-1/-1->0->1 [19] -1/-1/-1->0->1 [20] -1/-1/-1->0->1 [21] -1/-1/-1->0->1 [22] -1/-1/-1->0->1 [23] -1/-1/-1->0->1 |
| n136-128-154:2198157:2198515 [0] NCCL INFO P2P Chunksize set to 524288 |
| n136-128-154:2198158:2198521 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2198158:2198528 [1] NCCL INFO [Proxy Service] Device 1 CPU core 101 |
| n136-128-154:2198158:2198529 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 42 |
| n136-128-154:2198157:2198515 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. |
| n136-128-154:2198157:2198515 [0] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 0 |
| n136-128-154:2198157:2198530 [0] NCCL INFO [Proxy Service] Device 0 CPU core 44 |
| n136-128-154:2198157:2198531 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 109 |
| n136-128-154:2198158:2198521 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2198158:2198521 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2198157:2198515 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512 |
| n136-128-154:2198157:2198515 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer |
| n136-128-154:2198157:2198515 [0] NCCL INFO CC Off, workFifoBytes 1048576 |
| n136-128-154:2198157:2198515 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2198157:2198515 [0] NCCL INFO ncclCommInitRankConfig comm 0x13b13560 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId c5000 commId 0x8dc6d77cdbe28b76 - Init COMPLETE |
| n136-128-154:2198157:2198515 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 2 total 2.30 (kernels 0.21, alloc 0.16, bootstrap 1.76, allgathers 0.01, topo 0.08, graphs 0.00, connections 0.03, rest 0.06) |
| n136-128-154:2198158:2198521 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. |
| n136-128-154:2198158:2198521 [1] NCCL INFO ncclCommInitRankConfig comm 0xf821860 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId c9000 commId 0x8dc6d77cdbe28b76 - Init COMPLETE |
| n136-128-154:2198158:2198521 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 2 total 0.43 (kernels 0.12, alloc 0.13, bootstrap 0.00, allgathers 0.01, topo 0.08, graphs 0.00, connections 0.03, rest 0.06) |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 00/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 01/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 02/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 03/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 04/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 05/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 06/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 07/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 08/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 09/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 10/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 11/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 12/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 13/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 14/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 15/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 16/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 17/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 18/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 19/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 20/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 21/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 22/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Channel 23/0 : 0[6] -> 1[7] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 00/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 01/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 02/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 03/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 04/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 05/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 06/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 07/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 08/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 09/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 10/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 11/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 12/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 13/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 14/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 15/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 16/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 17/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 18/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 19/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 20/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 21/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 22/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2198533 [1] NCCL INFO Channel 23/0 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198157:2198532 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| n136-128-154:2198158:2198533 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 |
| 2025-12-09:15:09:47 INFO [evaluator:559] Running loglikelihood requests |
| 2025-12-09:15:09:47 INFO [evaluator:559] Running loglikelihood requests |
|
Running loglikelihood requests: 0%| | 0/20084 [00:00<?, ?it/s]Passed argument batch_size = auto:1. Detecting largest batch size |
| Passed argument batch_size = auto:1. Detecting largest batch size |
| Determined largest batch size: 57 |
| Determined largest batch size: 57 |
|
Running loglikelihood requests: 0%| | 1/20084 [00:40<226:18:05, 40.57s/it]
Running loglikelihood requests: 0%| | 58/20084 [00:45<3:14:10, 1.72it/s]
Running loglikelihood requests: 1%| | 115/20084 [00:50<1:36:43, 3.44it/s]
Running loglikelihood requests: 1%| | 172/20084 [00:55<1:04:51, 5.12it/s]
Running loglikelihood requests: 1%| | 229/20084 [00:59<49:42, 6.66it/s]
Running loglikelihood requests: 1%|โ | 286/20084 [01:04<41:15, 8.00it/s]
Running loglikelihood requests: 2%|โ | 343/20084 [01:08<36:04, 9.12it/s]
Running loglikelihood requests: 2%|โ | 400/20084 [01:13<32:44, 10.02it/s]
Running loglikelihood requests: 2%|โ | 457/20084 [01:17<30:30, 10.72it/s]
Running loglikelihood requests: 3%|โ | 514/20084 [01:22<28:56, 11.27it/s]
Running loglikelihood requests: 3%|โ | 571/20084 [01:26<27:49, 11.69it/s]
Running loglikelihood requests: 3%|โ | 628/20084 [01:31<26:58, 12.02it/s]
Running loglikelihood requests: 3%|โ | 685/20084 [01:35<26:20, 12.28it/s]
Running loglikelihood requests: 4%|โ | 742/20084 [01:40<25:50, 12.48it/s]
Running loglikelihood requests: 4%|โ | 799/20084 [01:44<25:26, 12.63it/s]
Running loglikelihood requests: 4%|โ | 856/20084 [01:48<25:08, 12.75it/s]
Running loglikelihood requests: 5%|โ | 913/20084 [01:53<24:52, 12.84it/s]
Running loglikelihood requests: 5%|โ | 970/20084 [01:57<24:39, 12.92it/s]
Running loglikelihood requests: 5%|โ | 1027/20084 [02:01<24:27, 12.99it/s]
Running loglikelihood requests: 5%|โ | 1084/20084 [02:06<24:16, 13.05it/s]
Running loglikelihood requests: 6%|โ | 1141/20084 [02:10<24:05, 13.10it/s]
Running loglikelihood requests: 6%|โ | 1198/20084 [02:14<23:56, 13.14it/s]
Running loglikelihood requests: 6%|โ | 1255/20084 [02:19<23:51, 13.15it/s]
Running loglikelihood requests: 7%|โ | 1312/20084 [02:23<23:46, 13.16it/s]
Running loglikelihood requests: 7%|โ | 1369/20084 [02:27<23:37, 13.20it/s]
Running loglikelihood requests: 7%|โ | 1426/20084 [02:32<23:29, 13.24it/s]
Running loglikelihood requests: 7%|โ | 1483/20084 [02:36<23:21, 13.27it/s]
Running loglikelihood requests: 8%|โ | 1540/20084 [02:40<23:14, 13.30it/s]
Running loglikelihood requests: 8%|โ | 1597/20084 [02:44<23:06, 13.34it/s]
Running loglikelihood requests: 8%|โ | 1654/20084 [02:49<22:58, 13.37it/s]
Running loglikelihood requests: 9%|โ | 1711/20084 [02:53<22:51, 13.40it/s]
Running loglikelihood requests: 9%|โ | 1768/20084 [02:57<22:44, 13.43it/s]
Running loglikelihood requests: 9%|โ | 1825/20084 [03:01<22:36, 13.46it/s]
Running loglikelihood requests: 9%|โ | 1882/20084 [03:06<22:30, 13.48it/s]
Running loglikelihood requests: 10%|โ | 1939/20084 [03:10<22:23, 13.51it/s]
Running loglikelihood requests: 10%|โ | 1996/20084 [03:14<22:16, 13.53it/s]
Running loglikelihood requests: 10%|โ | 2053/20084 [03:18<22:10, 13.56it/s]
Running loglikelihood requests: 11%|โ | 2110/20084 [03:22<22:03, 13.58it/s]
Running loglikelihood requests: 11%|โ | 2167/20084 [03:26<21:56, 13.60it/s]
Running loglikelihood requests: 11%|โ | 2224/20084 [03:31<21:51, 13.62it/s]
Running loglikelihood requests: 11%|โโ | 2281/20084 [03:35<21:45, 13.64it/s]
Running loglikelihood requests: 12%|โโ | 2338/20084 [03:39<21:38, 13.66it/s]
Running loglikelihood requests: 12%|โโ | 2395/20084 [03:43<21:32, 13.68it/s]
Running loglikelihood requests: 12%|โโ | 2452/20084 [03:47<21:26, 13.71it/s]
Running loglikelihood requests: 12%|โโ | 2509/20084 [03:51<21:21, 13.72it/s]
Running loglikelihood requests: 13%|โโ | 2566/20084 [03:56<21:15, 13.74it/s]
Running loglikelihood requests: 13%|โโ | 2623/20084 [04:00<21:10, 13.75it/s]
Running loglikelihood requests: 13%|โโ | 2680/20084 [04:04<21:04, 13.76it/s]
Running loglikelihood requests: 14%|โโ | 2737/20084 [04:08<20:58, 13.78it/s]
Running loglikelihood requests: 14%|โโ | 2794/20084 [04:12<20:53, 13.80it/s]
Running loglikelihood requests: 14%|โโ | 2851/20084 [04:16<20:47, 13.81it/s]
Running loglikelihood requests: 14%|โโ | 2908/20084 [04:20<20:42, 13.83it/s]
Running loglikelihood requests: 15%|โโ | 2965/20084 [04:24<20:36, 13.84it/s]
Running loglikelihood requests: 15%|โโ | 3022/20084 [04:28<20:31, 13.85it/s]
Running loglikelihood requests: 15%|โโ | 3079/20084 [04:33<20:26, 13.86it/s]
Running loglikelihood requests: 16%|โโ | 3136/20084 [04:37<20:21, 13.88it/s]
Running loglikelihood requests: 16%|โโ | 3193/20084 [04:41<20:15, 13.89it/s]
Running loglikelihood requests: 16%|โโ | 3250/20084 [04:45<20:10, 13.90it/s]
Running loglikelihood requests: 16%|โโ | 3307/20084 [04:49<20:05, 13.92it/s]
Running loglikelihood requests: 17%|โโ | 3364/20084 [04:53<19:59, 13.93it/s]
Running loglikelihood requests: 17%|โโ | 3421/20084 [04:57<19:54, 13.96it/s]
Running loglikelihood requests: 17%|โโ | 3478/20084 [05:01<19:48, 13.98it/s]
Running loglikelihood requests: 18%|โโ | 3535/20084 [05:05<19:42, 13.99it/s]
Running loglikelihood requests: 18%|โโ | 3592/20084 [05:09<19:37, 14.01it/s]
Running loglikelihood requests: 18%|โโ | 3649/20084 [05:13<19:31, 14.03it/s]
Running loglikelihood requests: 18%|โโ | 3706/20084 [05:17<19:25, 14.05it/s]
Running loglikelihood requests: 19%|โโ | 3763/20084 [05:21<19:20, 14.06it/s]
Running loglikelihood requests: 19%|โโ | 3820/20084 [05:25<19:15, 14.07it/s]
Running loglikelihood requests: 19%|โโ | 3877/20084 [05:30<19:10, 14.09it/s]
Running loglikelihood requests: 20%|โโ | 3934/20084 [05:34<19:05, 14.10it/s]
Running loglikelihood requests: 20%|โโ | 3991/20084 [05:38<18:59, 14.12it/s]
Running loglikelihood requests: 20%|โโ | 4048/20084 [05:42<18:54, 14.14it/s]
Running loglikelihood requests: 20%|โโ | 4105/20084 [05:46<18:48, 14.16it/s]
Running loglikelihood requests: 21%|โโ | 4162/20084 [05:50<18:44, 14.16it/s]
Running loglikelihood requests: 21%|โโ | 4219/20084 [05:54<18:39, 14.18it/s]
Running loglikelihood requests: 21%|โโโ | 4276/20084 [05:58<18:34, 14.19it/s]
Running loglikelihood requests: 22%|โโโ | 4333/20084 [06:02<18:29, 14.20it/s]
Running loglikelihood requests: 22%|โโโ | 4390/20084 [06:06<18:24, 14.20it/s]
Running loglikelihood requests: 22%|โโโ | 4447/20084 [06:10<18:20, 14.20it/s]
Running loglikelihood requests: 22%|โโโ | 4504/20084 [06:14<18:15, 14.23it/s]
Running loglikelihood requests: 23%|โโโ | 4561/20084 [06:18<18:09, 14.24it/s]
Running loglikelihood requests: 23%|โโโ | 4618/20084 [06:22<18:00, 14.31it/s]
Running loglikelihood requests: 23%|โโโ | 4675/20084 [06:26<17:53, 14.36it/s]
Running loglikelihood requests: 24%|โโโ | 4732/20084 [06:29<17:47, 14.38it/s]
Running loglikelihood requests: 24%|โโโ | 4789/20084 [06:33<17:41, 14.41it/s]
Running loglikelihood requests: 24%|โโโ | 4846/20084 [06:37<17:36, 14.43it/s]
Running loglikelihood requests: 24%|โโโ | 4903/20084 [06:41<17:33, 14.41it/s]
Running loglikelihood requests: 25%|โโโ | 4960/20084 [06:45<17:30, 14.39it/s]
Running loglikelihood requests: 25%|โโโ | 5017/20084 [06:49<17:27, 14.38it/s]
Running loglikelihood requests: 25%|โโโ | 5074/20084 [06:53<17:23, 14.38it/s]
Running loglikelihood requests: 26%|โโโ | 5131/20084 [06:57<17:19, 14.38it/s]
Running loglikelihood requests: 26%|โโโ | 5188/20084 [07:01<17:14, 14.39it/s]
Running loglikelihood requests: 26%|โโโ | 5245/20084 [07:05<17:10, 14.40it/s]
Running loglikelihood requests: 26%|โโโ | 5302/20084 [07:09<17:05, 14.41it/s]
Running loglikelihood requests: 27%|โโโ | 5359/20084 [07:13<17:01, 14.42it/s]
Running loglikelihood requests: 27%|โโโ | 5416/20084 [07:17<16:56, 14.43it/s]
Running loglikelihood requests: 27%|โโโ | 5473/20084 [07:21<16:51, 14.44it/s]
Running loglikelihood requests: 28%|โโโ | 5530/20084 [07:25<16:46, 14.46it/s]
Running loglikelihood requests: 28%|โโโ | 5587/20084 [07:29<16:41, 14.47it/s]
Running loglikelihood requests: 28%|โโโ | 5644/20084 [07:33<16:36, 14.49it/s]
Running loglikelihood requests: 28%|โโโ | 5701/20084 [07:37<16:31, 14.50it/s]
Running loglikelihood requests: 29%|โโโ | 5758/20084 [07:41<16:27, 14.51it/s]
Running loglikelihood requests: 29%|โโโ | 5815/20084 [07:44<16:19, 14.56it/s]
Running loglikelihood requests: 29%|โโโ | 5872/20084 [07:48<16:13, 14.60it/s]
Running loglikelihood requests: 30%|โโโ | 5929/20084 [07:52<16:07, 14.63it/s]
Running loglikelihood requests: 30%|โโโ | 5986/20084 [07:56<16:02, 14.65it/s]
Running loglikelihood requests: 30%|โโโ | 6043/20084 [08:00<15:58, 14.64it/s]
Running loglikelihood requests: 30%|โโโ | 6100/20084 [08:04<15:54, 14.64it/s]
Running loglikelihood requests: 31%|โโโ | 6157/20084 [08:08<15:50, 14.65it/s]
Running loglikelihood requests: 31%|โโโ | 6214/20084 [08:12<15:46, 14.66it/s]
Running loglikelihood requests: 31%|โโโ | 6271/20084 [08:15<15:42, 14.65it/s]
Running loglikelihood requests: 32%|โโโโ | 6328/20084 [08:19<15:37, 14.67it/s]
Running loglikelihood requests: 32%|โโโโ | 6385/20084 [08:23<15:32, 14.68it/s]
Running loglikelihood requests: 32%|โโโโ | 6442/20084 [08:27<15:28, 14.69it/s]
Running loglikelihood requests: 32%|โโโโ | 6499/20084 [08:31<15:23, 14.70it/s]
Running loglikelihood requests: 33%|โโโโ | 6556/20084 [08:35<15:19, 14.71it/s]
Running loglikelihood requests: 33%|โโโโ | 6613/20084 [08:39<15:14, 14.73it/s]
Running loglikelihood requests: 33%|โโโโ | 6670/20084 [08:43<15:10, 14.74it/s]
Running loglikelihood requests: 33%|โโโโ | 6727/20084 [08:46<15:05, 14.75it/s]
Running loglikelihood requests: 34%|โโโโ | 6784/20084 [08:50<15:01, 14.75it/s]
Running loglikelihood requests: 34%|โโโโ | 6841/20084 [08:54<14:57, 14.75it/s]
Running loglikelihood requests: 34%|โโโโ | 6898/20084 [08:58<14:53, 14.76it/s]
Running loglikelihood requests: 35%|โโโโ | 6955/20084 [09:02<14:49, 14.77it/s]
Running loglikelihood requests: 35%|โโโโ | 7012/20084 [09:06<14:44, 14.78it/s]
Running loglikelihood requests: 35%|โโโโ | 7069/20084 [09:10<14:39, 14.80it/s]
Running loglikelihood requests: 35%|โโโโ | 7126/20084 [09:13<14:31, 14.87it/s]
Running loglikelihood requests: 36%|โโโโ | 7183/20084 [09:17<14:24, 14.92it/s]
Running loglikelihood requests: 36%|โโโโ | 7240/20084 [09:21<14:19, 14.95it/s]
Running loglikelihood requests: 36%|โโโโ | 7297/20084 [09:25<14:15, 14.94it/s]
Running loglikelihood requests: 37%|โโโโ | 7354/20084 [09:29<14:12, 14.94it/s]
Running loglikelihood requests: 37%|โโโโ | 7411/20084 [09:32<14:09, 14.93it/s]
Running loglikelihood requests: 37%|โโโโ | 7468/20084 [09:36<14:05, 14.93it/s]
Running loglikelihood requests: 37%|โโโโ | 7525/20084 [09:40<14:00, 14.93it/s]
Running loglikelihood requests: 38%|โโโโ | 7582/20084 [09:44<13:56, 14.94it/s]
Running loglikelihood requests: 38%|โโโโ | 7639/20084 [09:48<13:52, 14.94it/s]
Running loglikelihood requests: 38%|โโโโ | 7696/20084 [09:51<13:48, 14.96it/s]
Running loglikelihood requests: 39%|โโโโ | 7753/20084 [09:55<13:43, 14.97it/s]
Running loglikelihood requests: 39%|โโโโ | 7810/20084 [09:59<13:39, 14.97it/s]
Running loglikelihood requests: 39%|โโโโ | 7867/20084 [10:03<13:35, 14.98it/s]
Running loglikelihood requests: 39%|โโโโ | 7924/20084 [10:07<13:30, 15.00it/s]
Running loglikelihood requests: 40%|โโโโ | 7981/20084 [10:10<13:26, 15.01it/s]
Running loglikelihood requests: 40%|โโโโ | 8038/20084 [10:14<13:22, 15.02it/s]
Running loglikelihood requests: 40%|โโโโ | 8095/20084 [10:18<13:17, 15.02it/s]
Running loglikelihood requests: 41%|โโโโ | 8152/20084 [10:22<13:13, 15.04it/s]
Running loglikelihood requests: 41%|โโโโ | 8209/20084 [10:26<13:08, 15.05it/s]
Running loglikelihood requests: 41%|โโโโ | 8266/20084 [10:29<13:05, 15.05it/s]
Running loglikelihood requests: 41%|โโโโโ | 8323/20084 [10:33<13:01, 15.06it/s]
Running loglikelihood requests: 42%|โโโโโ | 8380/20084 [10:37<12:56, 15.08it/s]
Running loglikelihood requests: 42%|โโโโโ | 8437/20084 [10:41<12:51, 15.09it/s]
Running loglikelihood requests: 42%|โโโโโ | 8494/20084 [10:44<12:47, 15.09it/s]
Running loglikelihood requests: 43%|โโโโโ | 8551/20084 [10:48<12:41, 15.15it/s]
Running loglikelihood requests: 43%|โโโโโ | 8608/20084 [10:52<12:35, 15.19it/s]
Running loglikelihood requests: 43%|โโโโโ | 8665/20084 [10:56<12:29, 15.23it/s]
Running loglikelihood requests: 43%|โโโโโ | 8722/20084 [10:59<12:26, 15.22it/s]
Running loglikelihood requests: 44%|โโโโโ | 8779/20084 [11:03<12:23, 15.20it/s]
Running loglikelihood requests: 44%|โโโโโ | 8836/20084 [11:07<12:20, 15.20it/s]
Running loglikelihood requests: 44%|โโโโโ | 8893/20084 [11:11<12:16, 15.20it/s]
Running loglikelihood requests: 45%|โโโโโ | 8950/20084 [11:14<12:12, 15.20it/s]
Running loglikelihood requests: 45%|โโโโโ | 9007/20084 [11:18<12:08, 15.20it/s]
Running loglikelihood requests: 45%|โโโโโ | 9064/20084 [11:22<12:04, 15.21it/s]
Running loglikelihood requests: 45%|โโโโโ | 9121/20084 [11:26<11:58, 15.26it/s]
Running loglikelihood requests: 46%|โโโโโ | 9178/20084 [11:29<11:53, 15.28it/s]
Running loglikelihood requests: 46%|โโโโโ | 9235/20084 [11:33<11:48, 15.31it/s]
Running loglikelihood requests: 46%|โโโโโ | 9292/20084 [11:37<11:45, 15.30it/s]
Running loglikelihood requests: 47%|โโโโโ | 9349/20084 [11:40<11:41, 15.30it/s]
Running loglikelihood requests: 47%|โโโโโ | 9406/20084 [11:44<11:37, 15.30it/s]
Running loglikelihood requests: 47%|โโโโโ | 9463/20084 [11:48<11:33, 15.31it/s]
Running loglikelihood requests: 47%|โโโโโ | 9520/20084 [11:52<11:29, 15.32it/s]
Running loglikelihood requests: 48%|โโโโโ | 9577/20084 [11:55<11:25, 15.32it/s]
Running loglikelihood requests: 48%|โโโโโ | 9634/20084 [11:59<11:21, 15.33it/s]
Running loglikelihood requests: 48%|โโโโโ | 9691/20084 [12:03<11:15, 15.39it/s]
Running loglikelihood requests: 49%|โโโโโ | 9748/20084 [12:06<11:09, 15.43it/s]
Running loglikelihood requests: 49%|โโโโโ | 9805/20084 [12:10<11:04, 15.47it/s]
Running loglikelihood requests: 49%|โโโโโ | 9862/20084 [12:14<10:59, 15.49it/s]
Running loglikelihood requests: 49%|โโโโโ | 9919/20084 [12:17<10:55, 15.52it/s]
Running loglikelihood requests: 50%|โโโโโ | 9976/20084 [12:21<10:51, 15.50it/s]
Running loglikelihood requests: 50%|โโโโโ | 10033/20084 [12:25<10:48, 15.50it/s]
Running loglikelihood requests: 50%|โโโโโ | 10090/20084 [12:28<10:45, 15.48it/s]
Running loglikelihood requests: 51%|โโโโโ | 10147/20084 [12:32<10:41, 15.48it/s]
Running loglikelihood requests: 51%|โโโโโ | 10204/20084 [12:36<10:37, 15.49it/s]
Running loglikelihood requests: 51%|โโโโโ | 10261/20084 [12:39<10:33, 15.50it/s]
Running loglikelihood requests: 51%|โโโโโโ | 10318/20084 [12:43<10:30, 15.50it/s]
Running loglikelihood requests: 52%|โโโโโโ | 10375/20084 [12:47<10:26, 15.51it/s]
Running loglikelihood requests: 52%|โโโโโโ | 10432/20084 [12:50<10:20, 15.56it/s]
Running loglikelihood requests: 52%|โโโโโโ | 10489/20084 [12:54<10:15, 15.58it/s]
Running loglikelihood requests: 53%|โโโโโโ | 10546/20084 [12:58<10:11, 15.61it/s]
Running loglikelihood requests: 53%|โโโโโโ | 10603/20084 [13:01<10:06, 15.64it/s]
Running loglikelihood requests: 53%|โโโโโโ | 10660/20084 [13:05<10:01, 15.66it/s]
Running loglikelihood requests: 53%|โโโโโโ | 10717/20084 [13:09<09:57, 15.68it/s]
Running loglikelihood requests: 54%|โโโโโโ | 10774/20084 [13:12<09:53, 15.69it/s]
Running loglikelihood requests: 54%|โโโโโโ | 10831/20084 [13:16<09:50, 15.67it/s]
Running loglikelihood requests: 54%|โโโโโโ | 10888/20084 [13:20<09:47, 15.64it/s]
Running loglikelihood requests: 54%|โโโโโโ | 10945/20084 [13:23<09:46, 15.59it/s]
Running loglikelihood requests: 55%|โโโโโโ | 11002/20084 [13:27<09:42, 15.59it/s]
Running loglikelihood requests: 55%|โโโโโโ | 11059/20084 [13:31<09:37, 15.62it/s]
Running loglikelihood requests: 55%|โโโโโโ | 11116/20084 [13:34<09:33, 15.63it/s]
Running loglikelihood requests: 56%|โโโโโโ | 11173/20084 [13:38<09:29, 15.64it/s]
Running loglikelihood requests: 56%|โโโโโโ | 11230/20084 [13:41<09:23, 15.71it/s]
Running loglikelihood requests: 56%|โโโโโโ | 11287/20084 [13:45<09:18, 15.75it/s]
Running loglikelihood requests: 56%|โโโโโโ | 11344/20084 [13:49<09:13, 15.78it/s]
Running loglikelihood requests: 57%|โโโโโโ | 11401/20084 [13:52<09:09, 15.81it/s]
Running loglikelihood requests: 57%|โโโโโโ | 11458/20084 [13:56<09:04, 15.84it/s]
Running loglikelihood requests: 57%|โโโโโโ | 11515/20084 [13:59<09:00, 15.85it/s]
Running loglikelihood requests: 58%|โโโโโโ | 11572/20084 [14:03<08:57, 15.83it/s]
Running loglikelihood requests: 58%|โโโโโโ | 11629/20084 [14:07<08:54, 15.81it/s]
Running loglikelihood requests: 58%|โโโโโโ | 11686/20084 [14:10<08:51, 15.81it/s]
Running loglikelihood requests: 58%|โโโโโโ | 11743/20084 [14:14<08:47, 15.81it/s]
Running loglikelihood requests: 59%|โโโโโโ | 11800/20084 [14:17<08:43, 15.82it/s]
Running loglikelihood requests: 59%|โโโโโโ | 11857/20084 [14:21<08:39, 15.82it/s]
Running loglikelihood requests: 59%|โโโโโโ | 11914/20084 [14:25<08:36, 15.82it/s]
Running loglikelihood requests: 60%|โโโโโโ | 11971/20084 [14:28<08:31, 15.86it/s]
Running loglikelihood requests: 60%|โโโโโโ | 12028/20084 [14:32<08:26, 15.90it/s]
Running loglikelihood requests: 60%|โโโโโโ | 12085/20084 [14:35<08:22, 15.92it/s]
Running loglikelihood requests: 60%|โโโโโโ | 12142/20084 [14:39<08:18, 15.95it/s]
Running loglikelihood requests: 61%|โโโโโโ | 12199/20084 [14:42<08:13, 15.97it/s]
Running loglikelihood requests: 61%|โโโโโโ | 12256/20084 [14:46<08:10, 15.96it/s]
Running loglikelihood requests: 61%|โโโโโโโ | 12313/20084 [14:50<08:07, 15.96it/s]
Running loglikelihood requests: 62%|โโโโโโโ | 12370/20084 [14:53<08:03, 15.96it/s]
Running loglikelihood requests: 62%|โโโโโโโ | 12427/20084 [14:57<07:59, 15.96it/s]
Running loglikelihood requests: 62%|โโโโโโโ | 12484/20084 [15:00<07:56, 15.97it/s]
Running loglikelihood requests: 62%|โโโโโโโ | 12541/20084 [15:04<07:52, 15.97it/s]
Running loglikelihood requests: 63%|โโโโโโโ | 12598/20084 [15:07<07:48, 15.97it/s]
Running loglikelihood requests: 63%|โโโโโโโ | 12655/20084 [15:11<07:44, 15.98it/s]
Running loglikelihood requests: 63%|โโโโโโโ | 12712/20084 [15:15<07:39, 16.03it/s]
Running loglikelihood requests: 64%|โโโโโโโ | 12769/20084 [15:18<07:35, 16.07it/s]
Running loglikelihood requests: 64%|โโโโโโโ | 12826/20084 [15:22<07:30, 16.11it/s]
Running loglikelihood requests: 64%|โโโโโโโ | 12883/20084 [15:25<07:26, 16.14it/s]
Running loglikelihood requests: 64%|โโโโโโโ | 12940/20084 [15:29<07:22, 16.15it/s]
Running loglikelihood requests: 65%|โโโโโโโ | 12997/20084 [15:32<07:19, 16.14it/s]
Running loglikelihood requests: 65%|โโโโโโโ | 13054/20084 [15:36<07:15, 16.13it/s]
Running loglikelihood requests: 65%|โโโโโโโ | 13111/20084 [15:39<07:11, 16.17it/s]
Running loglikelihood requests: 66%|โโโโโโโ | 13168/20084 [15:43<07:06, 16.21it/s]
Running loglikelihood requests: 66%|โโโโโโโ | 13225/20084 [15:46<07:02, 16.22it/s]
Running loglikelihood requests: 66%|โโโโโโโ | 13282/20084 [15:50<06:59, 16.23it/s]
Running loglikelihood requests: 66%|โโโโโโโ | 13339/20084 [15:53<06:55, 16.25it/s]
Running loglikelihood requests: 67%|โโโโโโโ | 13396/20084 [15:57<06:51, 16.27it/s]
Running loglikelihood requests: 67%|โโโโโโโ | 13453/20084 [16:00<06:47, 16.29it/s]
Running loglikelihood requests: 67%|โโโโโโโ | 13510/20084 [16:04<06:43, 16.30it/s]
Running loglikelihood requests: 68%|โโโโโโโ | 13567/20084 [16:07<06:39, 16.32it/s]
Running loglikelihood requests: 68%|โโโโโโโ | 13624/20084 [16:11<06:36, 16.31it/s]
Running loglikelihood requests: 68%|โโโโโโโ | 13681/20084 [16:14<06:32, 16.30it/s]
Running loglikelihood requests: 68%|โโโโโโโ | 13738/20084 [16:18<06:29, 16.29it/s]
Running loglikelihood requests: 69%|โโโโโโโ | 13795/20084 [16:21<06:25, 16.30it/s]
Running loglikelihood requests: 69%|โโโโโโโ | 13852/20084 [16:25<06:22, 16.30it/s]
Running loglikelihood requests: 69%|โโโโโโโ | 13909/20084 [16:28<06:18, 16.31it/s]
Running loglikelihood requests: 70%|โโโโโโโ | 13966/20084 [16:32<06:13, 16.36it/s]
Running loglikelihood requests: 70%|โโโโโโโ | 14023/20084 [16:35<06:09, 16.40it/s]
Running loglikelihood requests: 70%|โโโโโโโ | 14080/20084 [16:39<06:05, 16.44it/s]
Running loglikelihood requests: 70%|โโโโโโโ | 14137/20084 [16:42<06:01, 16.46it/s]
Running loglikelihood requests: 71%|โโโโโโโ | 14194/20084 [16:45<05:56, 16.50it/s]
Running loglikelihood requests: 71%|โโโโโโโ | 14251/20084 [16:49<05:53, 16.52it/s]
Running loglikelihood requests: 71%|โโโโโโโ | 14308/20084 [16:52<05:48, 16.55it/s]
Running loglikelihood requests: 72%|โโโโโโโโ | 14365/20084 [16:56<05:45, 16.57it/s]
Running loglikelihood requests: 72%|โโโโโโโโ | 14422/20084 [16:59<05:42, 16.55it/s]
Running loglikelihood requests: 72%|โโโโโโโโ | 14479/20084 [17:03<05:38, 16.57it/s]
Running loglikelihood requests: 72%|โโโโโโโโ | 14536/20084 [17:06<05:34, 16.59it/s]
Running loglikelihood requests: 73%|โโโโโโโโ | 14593/20084 [17:09<05:30, 16.60it/s]
Running loglikelihood requests: 73%|โโโโโโโโ | 14650/20084 [17:13<05:27, 16.61it/s]
Running loglikelihood requests: 73%|โโโโโโโโ | 14707/20084 [17:16<05:24, 16.59it/s]
Running loglikelihood requests: 74%|โโโโโโโโ | 14764/20084 [17:20<05:21, 16.57it/s]
Running loglikelihood requests: 74%|โโโโโโโโ | 14821/20084 [17:23<05:17, 16.57it/s]
Running loglikelihood requests: 74%|โโโโโโโโ | 14878/20084 [17:27<05:13, 16.59it/s]
Running loglikelihood requests: 74%|โโโโโโโโ | 14935/20084 [17:30<05:10, 16.60it/s]
Running loglikelihood requests: 75%|โโโโโโโโ | 14992/20084 [17:33<05:06, 16.61it/s]
Running loglikelihood requests: 75%|โโโโโโโโ | 15049/20084 [17:37<05:02, 16.63it/s]
Running loglikelihood requests: 75%|โโโโโโโโ | 15106/20084 [17:40<04:58, 16.67it/s]
Running loglikelihood requests: 75%|โโโโโโโโ | 15163/20084 [17:44<04:54, 16.73it/s]
Running loglikelihood requests: 76%|โโโโโโโโ | 15220/20084 [17:47<04:50, 16.77it/s]
Running loglikelihood requests: 76%|โโโโโโโโ | 15277/20084 [17:50<04:46, 16.80it/s]
Running loglikelihood requests: 76%|โโโโโโโโ | 15334/20084 [17:54<04:42, 16.83it/s]
Running loglikelihood requests: 77%|โโโโโโโโ | 15391/20084 [17:57<04:38, 16.85it/s]
Running loglikelihood requests: 77%|โโโโโโโโ | 15448/20084 [18:01<04:34, 16.87it/s]
Running loglikelihood requests: 77%|โโโโโโโโ | 15505/20084 [18:04<04:31, 16.89it/s]
Running loglikelihood requests: 77%|โโโโโโโโ | 15562/20084 [18:07<04:27, 16.90it/s]
Running loglikelihood requests: 78%|โโโโโโโโ | 15619/20084 [18:11<04:23, 16.92it/s]
Running loglikelihood requests: 78%|โโโโโโโโ | 15676/20084 [18:14<04:20, 16.92it/s]
Running loglikelihood requests: 78%|โโโโโโโโ | 15733/20084 [18:17<04:16, 16.94it/s]
Running loglikelihood requests: 79%|โโโโโโโโ | 15790/20084 [18:21<04:12, 16.99it/s]
Running loglikelihood requests: 79%|โโโโโโโโ | 15847/20084 [18:24<04:09, 17.02it/s]
Running loglikelihood requests: 79%|โโโโโโโโ | 15904/20084 [18:27<04:05, 17.02it/s]
Running loglikelihood requests: 79%|โโโโโโโโ | 15961/20084 [18:31<04:02, 17.00it/s]
Running loglikelihood requests: 80%|โโโโโโโโ | 16018/20084 [18:34<03:59, 17.01it/s]
Running loglikelihood requests: 80%|โโโโโโโโ | 16075/20084 [18:37<03:55, 17.02it/s]
Running loglikelihood requests: 80%|โโโโโโโโ | 16132/20084 [18:41<03:52, 17.03it/s]
Running loglikelihood requests: 81%|โโโโโโโโ | 16189/20084 [18:44<03:48, 17.08it/s]
Running loglikelihood requests: 81%|โโโโโโโโ | 16246/20084 [18:47<03:44, 17.13it/s]
Running loglikelihood requests: 81%|โโโโโโโโ | 16303/20084 [18:51<03:40, 17.19it/s]
Running loglikelihood requests: 81%|โโโโโโโโโ | 16360/20084 [18:54<03:36, 17.21it/s]
Running loglikelihood requests: 82%|โโโโโโโโโ | 16417/20084 [18:57<03:32, 17.24it/s]
Running loglikelihood requests: 82%|โโโโโโโโโ | 16474/20084 [19:01<03:29, 17.26it/s]
Running loglikelihood requests: 82%|โโโโโโโโโ | 16531/20084 [19:04<03:25, 17.28it/s]
Running loglikelihood requests: 83%|โโโโโโโโโ | 16588/20084 [19:07<03:21, 17.31it/s]
Running loglikelihood requests: 83%|โโโโโโโโโ | 16645/20084 [19:10<03:18, 17.33it/s]
Running loglikelihood requests: 83%|โโโโโโโโโ | 16702/20084 [19:14<03:14, 17.35it/s]
Running loglikelihood requests: 83%|โโโโโโโโโ | 16759/20084 [19:17<03:11, 17.38it/s]
Running loglikelihood requests: 84%|โโโโโโโโโ | 16816/20084 [19:20<03:07, 17.39it/s]
Running loglikelihood requests: 84%|โโโโโโโโโ | 16873/20084 [19:24<03:04, 17.37it/s]
Running loglikelihood requests: 84%|โโโโโโโโโ | 16930/20084 [19:27<03:01, 17.37it/s]
Running loglikelihood requests: 85%|โโโโโโโโโ | 16987/20084 [19:30<02:58, 17.38it/s]
Running loglikelihood requests: 85%|โโโโโโโโโ | 17044/20084 [19:33<02:54, 17.40it/s]
Running loglikelihood requests: 85%|โโโโโโโโโ | 17101/20084 [19:37<02:50, 17.46it/s]
Running loglikelihood requests: 85%|โโโโโโโโโ | 17158/20084 [19:40<02:47, 17.51it/s]
Running loglikelihood requests: 86%|โโโโโโโโโ | 17215/20084 [19:43<02:43, 17.56it/s]
Running loglikelihood requests: 86%|โโโโโโโโโ | 17272/20084 [19:46<02:39, 17.59it/s]
Running loglikelihood requests: 86%|โโโโโโโโโ | 17329/20084 [19:50<02:36, 17.59it/s]
Running loglikelihood requests: 87%|โโโโโโโโโ | 17386/20084 [19:53<02:33, 17.63it/s]
Running loglikelihood requests: 87%|โโโโโโโโโ | 17443/20084 [19:56<02:29, 17.66it/s]
Running loglikelihood requests: 87%|โโโโโโโโโ | 17500/20084 [19:59<02:26, 17.69it/s]
Running loglikelihood requests: 87%|โโโโโโโโโ | 17557/20084 [20:02<02:22, 17.72it/s]
Running loglikelihood requests: 88%|โโโโโโโโโ | 17614/20084 [20:06<02:19, 17.76it/s]
Running loglikelihood requests: 88%|โโโโโโโโโ | 17671/20084 [20:09<02:15, 17.78it/s]
Running loglikelihood requests: 88%|โโโโโโโโโ | 17728/20084 [20:12<02:12, 17.82it/s]
Running loglikelihood requests: 89%|โโโโโโโโโ | 17785/20084 [20:15<02:08, 17.86it/s]
Running loglikelihood requests: 89%|โโโโโโโโโ | 17842/20084 [20:18<02:05, 17.90it/s]
Running loglikelihood requests: 89%|โโโโโโโโโ | 17899/20084 [20:21<02:01, 17.94it/s]
Running loglikelihood requests: 89%|โโโโโโโโโ | 17956/20084 [20:25<01:58, 17.98it/s]
Running loglikelihood requests: 90%|โโโโโโโโโ | 18013/20084 [20:28<01:55, 18.00it/s]
Running loglikelihood requests: 90%|โโโโโโโโโ | 18070/20084 [20:31<01:51, 18.01it/s]
Running loglikelihood requests: 90%|โโโโโโโโโ | 18127/20084 [20:34<01:48, 18.03it/s]
Running loglikelihood requests: 91%|โโโโโโโโโ | 18184/20084 [20:37<01:45, 18.08it/s]
Running loglikelihood requests: 91%|โโโโโโโโโ | 18241/20084 [20:40<01:41, 18.10it/s]
Running loglikelihood requests: 91%|โโโโโโโโโ | 18298/20084 [20:43<01:38, 18.13it/s]
Running loglikelihood requests: 91%|โโโโโโโโโโ| 18355/20084 [20:47<01:35, 18.17it/s]
Running loglikelihood requests: 92%|โโโโโโโโโโ| 18412/20084 [20:50<01:31, 18.23it/s]
Running loglikelihood requests: 92%|โโโโโโโโโโ| 18469/20084 [20:53<01:28, 18.27it/s]
Running loglikelihood requests: 92%|โโโโโโโโโโ| 18526/20084 [20:56<01:24, 18.33it/s]
Running loglikelihood requests: 93%|โโโโโโโโโโ| 18583/20084 [20:59<01:21, 18.35it/s]
Running loglikelihood requests: 93%|โโโโโโโโโโ| 18640/20084 [21:02<01:18, 18.37it/s]
Running loglikelihood requests: 93%|โโโโโโโโโโ| 18697/20084 [21:05<01:15, 18.45it/s]
Running loglikelihood requests: 93%|โโโโโโโโโโ| 18754/20084 [21:08<01:11, 18.49it/s]
Running loglikelihood requests: 94%|โโโโโโโโโโ| 18811/20084 [21:11<01:08, 18.55it/s]
Running loglikelihood requests: 94%|โโโโโโโโโโ| 18868/20084 [21:14<01:05, 18.64it/s]
Running loglikelihood requests: 94%|โโโโโโโโโโ| 18925/20084 [21:17<01:01, 18.72it/s]
Running loglikelihood requests: 95%|โโโโโโโโโโ| 18982/20084 [21:20<00:58, 18.77it/s]
Running loglikelihood requests: 95%|โโโโโโโโโโ| 19039/20084 [21:23<00:55, 18.82it/s]
Running loglikelihood requests: 95%|โโโโโโโโโโ| 19096/20084 [21:26<00:52, 18.89it/s]
Running loglikelihood requests: 95%|โโโโโโโโโโ| 19153/20084 [21:29<00:49, 18.96it/s]
Running loglikelihood requests: 96%|โโโโโโโโโโ| 19210/20084 [21:32<00:45, 19.02it/s]
Running loglikelihood requests: 96%|โโโโโโโโโโ| 19267/20084 [21:35<00:42, 19.13it/s]
Running loglikelihood requests: 96%|โโโโโโโโโโ| 19324/20084 [21:38<00:39, 19.20it/s]
Running loglikelihood requests: 96%|โโโโโโโโโโ| 19381/20084 [21:41<00:36, 19.27it/s]
Running loglikelihood requests: 97%|โโโโโโโโโโ| 19438/20084 [21:44<00:33, 19.35it/s]
Running loglikelihood requests: 97%|โโโโโโโโโโ| 19495/20084 [21:47<00:30, 19.44it/s]
Running loglikelihood requests: 97%|โโโโโโโโโโ| 19552/20084 [21:50<00:27, 19.53it/s]
Running loglikelihood requests: 98%|โโโโโโโโโโ| 19609/20084 [21:53<00:24, 19.64it/s]
Running loglikelihood requests: 98%|โโโโโโโโโโ| 19666/20084 [21:56<00:21, 19.74it/s]
Running loglikelihood requests: 98%|โโโโโโโโโโ| 19723/20084 [21:58<00:18, 19.91it/s]
Running loglikelihood requests: 98%|โโโโโโโโโโ| 19780/20084 [22:01<00:15, 20.09it/s]
Running loglikelihood requests: 99%|โโโโโโโโโโ| 19837/20084 [22:04<00:12, 20.32it/s]
Running loglikelihood requests: 99%|โโโโโโโโโโ| 19894/20084 [22:07<00:09, 20.54it/s]
Running loglikelihood requests: 99%|โโโโโโโโโโ| 19951/20084 [22:09<00:06, 20.81it/s]
Running loglikelihood requests: 100%|โโโโโโโโโโ| 20008/20084 [22:12<00:03, 21.17it/s]
Running loglikelihood requests: 100%|โโโโโโโโโโ| 20065/20084 [22:13<00:00, 26.61it/s]
Running loglikelihood requests: 100%|โโโโโโโโโโ| 20084/20084 [22:13<00:00, 15.07it/s] |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 00/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 01/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 02/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 03/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 04/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 05/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 06/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 07/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 08/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 09/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 10/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 11/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 12/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 13/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 14/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 15/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 16/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 17/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 18/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 19/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 20/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 21/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 22/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 23/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 24/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 25/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 26/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 27/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 28/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 29/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 30/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| n136-128-154:2198158:2206208 [1] NCCL INFO Channel 31/1 : 1[7] -> 0[6] via P2P/CUMEM/read |
| fatal: detected dubious ownership in repository at '/mnt/bn/life-mllm/users/cxr/quantization' |
| To add an exception for this directory, call: |
|
|
| git config |
| n136-128-154:2198158:2206213 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2198158:2206213 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2198158:2206213 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2198158:2206213 [1] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2198158:2206213 [1] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2198158:2206213 [1] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2198158:2198528 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| 2025-12-09:15:33:33 INFO [loggers.evaluation_tracker:209] Saving results aggregated |
| hf (pretrained=/mnt/bn/life-mllm/users/cxr/quantization/models/Qwen2.5-7B-quantization-fg), gen_kwargs: (None), limit: None, num_fewshot: 10, batch_size: auto (57) |
| | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |
| | |
| |hellaswag| 1|none | 10|acc |โ |0.4218|ยฑ |0.0049| |
| | | |none | 10|acc_norm|โ |0.5710|ยฑ |0.0049| |
|
|
| [rank0]:[W1209 15:33:35.615863487 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) |
| n136-128-154:2198157:2206266 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2198157:2206266 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2198157:2206266 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2198157:2198530 [0] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2198157:2206266 [0] NCCL INFO misc/socket.cc:64 -> 3 |
| n136-128-154:2198158:2198528 [1] NCCL INFO misc/socket.cc:915 -> 3 |
| n136-128-154:2198157:2206266 [0] NCCL INFO misc/socket.cc:81 -> 3 |
| n136-128-154:2198157:2206266 [0] NCCL INFO misc/socket.cc:863 -> 3 |
| n136-128-154:2198158:2206213 [1] NCCL INFO comm 0xf821860 rank 1 nranks 2 cudaDev 1 busId c9000 - Abort COMPLETE |
| n136-128-154:2198157:2206266 [0] NCCL INFO comm 0x13b13560 rank 0 nranks 2 cudaDev 0 busId c5000 - Abort COMPLETE |
| ไปปๅก hellaswag ่ฏไผฐๅฎๆ๏ผ |
| Execution time: 2885.37 seconds |
|
|