Submitted by Zmushko Philip 20 One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining Yandex Research 2