llm_cp2 / slurm-6181860.out
csuhan's picture
Upload folder using huggingface_hub
b0c0df0 verified
开始时间: Sat Nov 1 14:02:07 CST 2025
节点列表: SH-IDCA1404-10-140-54-69
总进程数: 8
当前任务ID: 6181860
INFO: User not listed in /etc/subuid, trying root-mapped namespace
INFO: No user namespaces available, using only the fakeroot command
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
开始时间: Sat Nov 1 14:07:35 CST 2025
节点列表: SH-IDCA1404-10-140-54-69
总进程数: 8
当前任务ID: 6181860
INFO: User not listed in /etc/subuid, trying root-mapped namespace
INFO: No user namespaces available, using only the fakeroot command
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
[INFO|2025-11-01 06:07:53] llamafactory.launcher:143 >> Initializing 8 distributed tasks at: 127.0.0.1:17821
W1101 06:07:57.063000 40488 site-packages/torch/distributed/run.py:792]
W1101 06:07:57.063000 40488 site-packages/torch/distributed/run.py:792] *****************************************
W1101 06:07:57.063000 40488 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1101 06:07:57.063000 40488 site-packages/torch/distributed/run.py:792] *****************************************
slurmstepd: error: *** JOB 6181860 ON SH-IDCA1404-10-140-54-69 CANCELLED AT 2025-11-01T14:08:08 DUE TO PREEMPTION ***
W1101 06:08:08.849000 40488 site-packages/torch/distributed/elastic/agent/server/api.py:719] Received 15 death signal, shutting down workers
W1101 06:08:08.851000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40555 closing signal SIGTERM
W1101 06:08:08.852000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40556 closing signal SIGTERM
W1101 06:08:08.853000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40557 closing signal SIGTERM
W1101 06:08:08.854000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40558 closing signal SIGTERM
W1101 06:08:08.855000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40559 closing signal SIGTERM
W1101 06:08:08.856000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40560 closing signal SIGTERM
W1101 06:08:08.857000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40561 closing signal SIGTERM
W1101 06:08:08.858000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40562 closing signal SIGTERM
开始时间: Sat Nov 1 14:14:51 CST 2025
节点列表: SH-IDCA1404-10-140-54-51
总进程数: 8
当前任务ID: 6181860
INFO: User not listed in /etc/subuid, trying root-mapped namespace
INFO: No user namespaces available, using only the fakeroot command
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
开始时间: Sat Nov 1 14:36:14 CST 2025
节点列表: SH-IDCA1404-10-140-54-69
总进程数: 8
当前任务ID: 6181860
INFO: User not listed in /etc/subuid, trying root-mapped namespace
INFO: No user namespaces available, using only the fakeroot command
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
[INFO|2025-11-01 06:36:31] llamafactory.launcher:143 >> Initializing 8 distributed tasks at: 127.0.0.1:17821
W1101 06:36:35.028000 101064 site-packages/torch/distributed/run.py:792]
W1101 06:36:35.028000 101064 site-packages/torch/distributed/run.py:792] *****************************************
W1101 06:36:35.028000 101064 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1101 06:36:35.028000 101064 site-packages/torch/distributed/run.py:792] *****************************************
[2025-11-01 06:36:53,992] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,992] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,992] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,993] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,993] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,993] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,994] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,994] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources