Spaces:
Sleeping
Sleeping
π€ Textilindo AI Training Guide for Hugging Face Spaces
π Training Options on Hugging Face Spaces
Option 1: Quick Training (Recommended for HF Spaces)
Use the lightweight training script designed for HF Spaces constraints.
Access Training Interface:
- Visit:
https://harismlnaslm-Textilindo-AI.hf.space/train - Click "Start Lightweight Training"
- Monitor progress in the training log
Manual Training:
python quick_train.py
Option 2: Use Existing Scripts
Run the full training scripts (may be resource-intensive):
# Check if training is ready
python scripts/check_training_ready.py
# Run lightweight training
python scripts/train_textilindo_ai_optimized.py
# Test the trained model
python scripts/test_textilindo_ai.py
Option 3: External Training + Upload
Train on external resources and upload the model:
Train locally or on cloud:
python scripts/train_textilindo_ai.pyUpload trained model to HF Hub:
huggingface-cli upload your-username/textilindo-trained-model ./models/trained-modelUse the uploaded model in your space
π§ Training Configuration
For HF Spaces (Limited Resources):
- Model:
distilgpt2(small, fast) - Batch Size: 1
- Epochs: 1
- Max Length: 128 tokens
- Training Time: ~5 minutes
For External Training (Full Resources):
- Model:
meta-llama/Llama-3.1-8B-Instruct - Batch Size: 4-8
- Epochs: 3
- Max Length: 2048 tokens
- Training Time: Hours
π Training Data
Your space includes these training datasets:
data/lora_dataset_20250829_113330.jsonl(33 samples)data/lora_dataset_20250910_145055.jsonldata/textilindo_training_data.jsonldata/training_data.jsonl
π― Training Endpoints
Web Interface:
- Training UI:
/train - Start Training:
POST /train/start - Check Status:
GET /train/status - View Data:
GET /train/data
API Usage:
# Start training
curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/train/start"
# Check resources
curl "https://harismlnaslm-Textilindo-AI.hf.space/train/status"
# View training data
curl "https://harismlnaslm-Textilindo-AI.hf.space/train/data"
β οΈ Limitations of HF Spaces Training
Resource Constraints:
- CPU Only: No GPU acceleration
- Memory: Limited to ~4GB RAM
- Time: 5-minute timeout for training
- Storage: Limited disk space
Recommended Approach:
- Quick Demo Training: Use
quick_train.pyfor testing - Full Training: Use external resources (Google Colab, AWS, etc.)
- Model Upload: Upload pre-trained models to HF Hub
π External Training Options
Google Colab (Free GPU):
# Upload your training data
# Run: python scripts/train_textilindo_ai.py
# Download trained model
# Upload to HF Hub
Local Training:
# Setup environment
python scripts/setup_textilindo_training.py
# Download model
python scripts/download_model.py
# Run training
python scripts/train_textilindo_ai.py
# Test model
python scripts/test_textilindo_ai.py
Cloud Training (AWS/GCP):
# Use the monitoring script
python scripts/train_with_monitoring.py
π Training Progress Monitoring
On HF Spaces:
- Check the training log in the web interface
- Use
/train/statusendpoint for resource monitoring
External Training:
# Use monitoring script
python scripts/train_with_monitoring.py
# Check logs
tail -f logs/training.log
π§ͺ Testing Trained Models
Quick Test:
python quick_train.py # Includes testing
Full Testing:
python scripts/test_textilindo_ai.py
python scripts/test_model.py
API Testing:
# Test chat endpoint
curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/chat" \
-H "Content-Type: application/json" \
-d '{"message": "dimana lokasi textilindo?"}'
π§ Troubleshooting
Common Issues:
"Out of Memory"
- Use smaller models (distilgpt2)
- Reduce batch size
- Use external training
"Training Timeout"
- HF Spaces has 5-minute limit
- Use external resources for full training
"Model Not Found"
- Check if model is downloaded
- Use
python scripts/download_model.py
"Data Not Found"
- Verify data files exist in
data/directory - Check file permissions
- Verify data files exist in
π Next Steps
- Start with Quick Training: Test the setup with
quick_train.py - Monitor Resources: Use
/train/statusto check available resources - External Training: For full training, use external resources
- Model Upload: Upload trained models to Hugging Face Hub
- Integration: Use uploaded models in your space
π Success Indicators
- β Training completes without errors
- β
Model saves to
./models/directory - β Test responses are generated
- β Chat interface works with trained model
- β API endpoints respond correctly
Happy Training! π