Textilindo-AI / TRAINING_GUIDE.md
harismlnaslm's picture
Add pure API-based training system with GPU support and background processing
701eb48
# πŸ€– Textilindo AI Training Guide for Hugging Face Spaces
## πŸš€ Training Options on Hugging Face Spaces
### Option 1: **Quick Training (Recommended for HF Spaces)**
Use the lightweight training script designed for HF Spaces constraints.
**Access Training Interface:**
- Visit: `https://harismlnaslm-Textilindo-AI.hf.space/train`
- Click "Start Lightweight Training"
- Monitor progress in the training log
**Manual Training:**
```bash
python quick_train.py
```
### Option 2: **Use Existing Scripts**
Run the full training scripts (may be resource-intensive):
```bash
# Check if training is ready
python scripts/check_training_ready.py
# Run lightweight training
python scripts/train_textilindo_ai_optimized.py
# Test the trained model
python scripts/test_textilindo_ai.py
```
### Option 3: **External Training + Upload**
Train on external resources and upload the model:
1. **Train locally or on cloud:**
```bash
python scripts/train_textilindo_ai.py
```
2. **Upload trained model to HF Hub:**
```bash
huggingface-cli upload your-username/textilindo-trained-model ./models/trained-model
```
3. **Use the uploaded model in your space**
## πŸ”§ Training Configuration
### For HF Spaces (Limited Resources):
- **Model**: `distilgpt2` (small, fast)
- **Batch Size**: 1
- **Epochs**: 1
- **Max Length**: 128 tokens
- **Training Time**: ~5 minutes
### For External Training (Full Resources):
- **Model**: `meta-llama/Llama-3.1-8B-Instruct`
- **Batch Size**: 4-8
- **Epochs**: 3
- **Max Length**: 2048 tokens
- **Training Time**: Hours
## πŸ“Š Training Data
Your space includes these training datasets:
- `data/lora_dataset_20250829_113330.jsonl` (33 samples)
- `data/lora_dataset_20250910_145055.jsonl`
- `data/textilindo_training_data.jsonl`
- `data/training_data.jsonl`
## 🎯 Training Endpoints
### Web Interface:
- **Training UI**: `/train`
- **Start Training**: `POST /train/start`
- **Check Status**: `GET /train/status`
- **View Data**: `GET /train/data`
### API Usage:
```bash
# Start training
curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/train/start"
# Check resources
curl "https://harismlnaslm-Textilindo-AI.hf.space/train/status"
# View training data
curl "https://harismlnaslm-Textilindo-AI.hf.space/train/data"
```
## ⚠️ Limitations of HF Spaces Training
### Resource Constraints:
- **CPU Only**: No GPU acceleration
- **Memory**: Limited to ~4GB RAM
- **Time**: 5-minute timeout for training
- **Storage**: Limited disk space
### Recommended Approach:
1. **Quick Demo Training**: Use `quick_train.py` for testing
2. **Full Training**: Use external resources (Google Colab, AWS, etc.)
3. **Model Upload**: Upload pre-trained models to HF Hub
## πŸš€ External Training Options
### Google Colab (Free GPU):
```python
# Upload your training data
# Run: python scripts/train_textilindo_ai.py
# Download trained model
# Upload to HF Hub
```
### Local Training:
```bash
# Setup environment
python scripts/setup_textilindo_training.py
# Download model
python scripts/download_model.py
# Run training
python scripts/train_textilindo_ai.py
# Test model
python scripts/test_textilindo_ai.py
```
### Cloud Training (AWS/GCP):
```bash
# Use the monitoring script
python scripts/train_with_monitoring.py
```
## πŸ“ˆ Training Progress Monitoring
### On HF Spaces:
- Check the training log in the web interface
- Use `/train/status` endpoint for resource monitoring
### External Training:
```bash
# Use monitoring script
python scripts/train_with_monitoring.py
# Check logs
tail -f logs/training.log
```
## πŸ§ͺ Testing Trained Models
### Quick Test:
```bash
python quick_train.py # Includes testing
```
### Full Testing:
```bash
python scripts/test_textilindo_ai.py
python scripts/test_model.py
```
### API Testing:
```bash
# Test chat endpoint
curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/chat" \
-H "Content-Type: application/json" \
-d '{"message": "dimana lokasi textilindo?"}'
```
## πŸ”§ Troubleshooting
### Common Issues:
1. **"Out of Memory"**
- Use smaller models (distilgpt2)
- Reduce batch size
- Use external training
2. **"Training Timeout"**
- HF Spaces has 5-minute limit
- Use external resources for full training
3. **"Model Not Found"**
- Check if model is downloaded
- Use `python scripts/download_model.py`
4. **"Data Not Found"**
- Verify data files exist in `data/` directory
- Check file permissions
## πŸ“š Next Steps
1. **Start with Quick Training**: Test the setup with `quick_train.py`
2. **Monitor Resources**: Use `/train/status` to check available resources
3. **External Training**: For full training, use external resources
4. **Model Upload**: Upload trained models to Hugging Face Hub
5. **Integration**: Use uploaded models in your space
## πŸŽ‰ Success Indicators
- βœ… Training completes without errors
- βœ… Model saves to `./models/` directory
- βœ… Test responses are generated
- βœ… Chat interface works with trained model
- βœ… API endpoints respond correctly
---
*Happy Training! πŸš€*