Training on Hugging Face with GPUs
This guide explains how to train the Energy Halting experiment on Hugging Face infrastructure, including local GPU training with accelerate and deployment to Hugging Face Spaces.
Prerequisites
- Hugging Face Account: Create one at huggingface.co.
- Access Token: Get a write token from huggingface.co/settings/tokens.
- Pixi: Installed locally.
1. Local Training with Accelerate
We use Hugging Face accelerate for robust multi-GPU and mixed-precision training.
Setup
Ensure dependencies are installed:
pixi install
Configure Accelerate
Run the configuration wizard to set up your GPU environment (e.g., number of GPUs, mixed precision):
pixi run accelerate config
Run Training
Use accelerate launch to start training. This handles device placement automatically.
pixi run accelerate launch tasks/image_classification/train_energy.py \
--energy_head_enabled \
--loss_type energy_contrastive \
--dataset cifar10 \
--batch_size 32 \
--use_amp \
--push_to_hub \
--hub_model_id <your-username>/ctm-energy-cifar10 \
--hub_token <your-token>
2. Deploying to Hugging Face Spaces (GPU)
You can run this training job on a Hugging Face Space with a GPU.
Create a Space
- Go to huggingface.co/new-space.
- Name:
ctm-energy-training(or similar). - SDK: Docker.
- Hardware: Choose a GPU instance (e.g., Nvidia T4, A10G).
Deploy Code
You can deploy by pushing your code to the Space's repository.
Clone the Space:
git clone https://huggingface.co/spaces/<your-username>/ctm-energy-training cd ctm-energy-trainingCopy Files: Copy your project files into this directory (excluding
.git,.pixi,data,logs). Crucially, ensureDockerfile,pixi.toml,pixi.lock,tasks/,models/,utils/, andconfigs/are present.Push:
git add . git commit -m "Deploy training job" git push
Environment Variables
To allow the Space to push the trained model back to the Hub, you need to set your HF token as a secret.
- Go to your Space's Settings.
- Scroll to Variables and secrets.
- Add a New Secret:
- Name:
HF_TOKEN - Value: Your write token.
- Name:
Update Dockerfile CMD (Optional)
The default Dockerfile CMD prints help. To run training immediately upon deployment, modify the CMD in the Dockerfile before pushing:
CMD ["--energy_head_enabled", "--loss_type", "energy_contrastive", "--push_to_hub", "--hub_model_id", "<your-username>/ctm-energy-cifar10", "--hub_token", "$HF_TOKEN"]
Note: You'll need to pass the token via env var or arg.
3. Monitoring
- Local: Check the
logs/directory or WandB if enabled (--wandb). - Spaces: Check the Logs tab in your Space.