Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| """composer_replication.diloco.serverless — run Decoupled DiLoCo across | |
| serverless training systems (Modal, HuggingFace Jobs, SageMaker, k8s, …). | |
| Per ADR-005, the design rests on two abstractions: | |
| 1. `ServerlessExecutor` Protocol — a uniform interface for spinning up | |
| N replicas on different cloud backends. Each backend (Modal, HF Jobs, | |
| SageMaker, etc.) gets a concrete adapter that implements the Protocol. | |
| 2. `ObjectStoreAllReduce` — fsspec-backed pseudo-gradient exchange that | |
| replaces the in-process `torchft.Manager.allreduce` call. The | |
| communication pattern is `S3 PutObject + N GetObjects` once per | |
| ~500-1000 inner steps, which matches DiLoCo's actual sync cadence | |
| (paper arXiv:2311.08105 §3.2). Bandwidth: ~2 GB / 30 minutes per | |
| replica for 1B-param bf16, well within S3 free-tier. | |
| The framework's existing `composer_replication.diloco.make_diloco_outer_loop` | |
| wraps `torchft.local_sgd.DiLoCo`. To run that across N serverless replicas: | |
| >>> from composer_replication.diloco.serverless import ( | |
| ... LocalProcessExecutor, | |
| ... ObjectStoreAllReduce, | |
| ... ) | |
| >>> rendezvous = ObjectStoreAllReduce("s3://my-bucket/diloco-runs/run42/") | |
| >>> executor = LocalProcessExecutor() | |
| >>> handles = executor.launch_replicas( | |
| ... n_replicas=4, | |
| ... entrypoint="composer_replication.diloco.serverless.replica_entrypoint", | |
| ... entrypoint_args={"rendezvous": rendezvous.uri, "rank_env": "REPLICA_RANK"}, | |
| ... ) | |
| >>> result = executor.collect(handles, timeout=3600) | |
| Module layout: | |
| - `executor.py` — `ServerlessExecutor` Protocol + base classes + `LocalProcessExecutor` | |
| - `allreduce.py` — `ObjectStoreAllReduce` + `MockManager` (drops into torchft path) | |
| - `modal.py` — `ModalExecutor` (skeleton — implements when modal-client is available) | |
| - `hf_jobs.py` — `HFJobsExecutor` (skeleton — uses huggingface_hub.run_job) | |
| - `replica_entrypoint.py` — script each replica runs (loaded from object store) | |
| Optional dependency: `pip install -e .[serverless]` pulls fsspec + s3fs + | |
| gcsfs. Modal/HF Jobs adapters require `modal` and `huggingface_hub` respectively; | |
| both are checked at adapter init time, not at module import. | |
| """ | |
| from __future__ import annotations | |
| from composer_replication.diloco.serverless.allreduce import ( | |
| MockManager, | |
| ObjectStoreAllReduce, | |
| ) | |
| from composer_replication.diloco.serverless.executor import ( | |
| LocalProcessExecutor, | |
| ReplicaHandle, | |
| ServerlessExecutor, | |
| ) | |
| from composer_replication.diloco.serverless.hf_jobs import HFJobsExecutor | |
| from composer_replication.diloco.serverless.modal import ModalExecutor | |
| __all__ = [ | |
| "HFJobsExecutor", | |
| "LocalProcessExecutor", | |
| "MockManager", | |
| "ModalExecutor", | |
| "ObjectStoreAllReduce", | |
| "ReplicaHandle", | |
| "ServerlessExecutor", | |
| ] | |