Spaces:
Sleeping
Sleeping
| # π€ Textilindo AI Training Guide for Hugging Face Spaces | |
| ## π Training Options on Hugging Face Spaces | |
| ### Option 1: **Quick Training (Recommended for HF Spaces)** | |
| Use the lightweight training script designed for HF Spaces constraints. | |
| **Access Training Interface:** | |
| - Visit: `https://harismlnaslm-Textilindo-AI.hf.space/train` | |
| - Click "Start Lightweight Training" | |
| - Monitor progress in the training log | |
| **Manual Training:** | |
| ```bash | |
| python quick_train.py | |
| ``` | |
| ### Option 2: **Use Existing Scripts** | |
| Run the full training scripts (may be resource-intensive): | |
| ```bash | |
| # Check if training is ready | |
| python scripts/check_training_ready.py | |
| # Run lightweight training | |
| python scripts/train_textilindo_ai_optimized.py | |
| # Test the trained model | |
| python scripts/test_textilindo_ai.py | |
| ``` | |
| ### Option 3: **External Training + Upload** | |
| Train on external resources and upload the model: | |
| 1. **Train locally or on cloud:** | |
| ```bash | |
| python scripts/train_textilindo_ai.py | |
| ``` | |
| 2. **Upload trained model to HF Hub:** | |
| ```bash | |
| huggingface-cli upload your-username/textilindo-trained-model ./models/trained-model | |
| ``` | |
| 3. **Use the uploaded model in your space** | |
| ## π§ Training Configuration | |
| ### For HF Spaces (Limited Resources): | |
| - **Model**: `distilgpt2` (small, fast) | |
| - **Batch Size**: 1 | |
| - **Epochs**: 1 | |
| - **Max Length**: 128 tokens | |
| - **Training Time**: ~5 minutes | |
| ### For External Training (Full Resources): | |
| - **Model**: `meta-llama/Llama-3.1-8B-Instruct` | |
| - **Batch Size**: 4-8 | |
| - **Epochs**: 3 | |
| - **Max Length**: 2048 tokens | |
| - **Training Time**: Hours | |
| ## π Training Data | |
| Your space includes these training datasets: | |
| - `data/lora_dataset_20250829_113330.jsonl` (33 samples) | |
| - `data/lora_dataset_20250910_145055.jsonl` | |
| - `data/textilindo_training_data.jsonl` | |
| - `data/training_data.jsonl` | |
| ## π― Training Endpoints | |
| ### Web Interface: | |
| - **Training UI**: `/train` | |
| - **Start Training**: `POST /train/start` | |
| - **Check Status**: `GET /train/status` | |
| - **View Data**: `GET /train/data` | |
| ### API Usage: | |
| ```bash | |
| # Start training | |
| curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/train/start" | |
| # Check resources | |
| curl "https://harismlnaslm-Textilindo-AI.hf.space/train/status" | |
| # View training data | |
| curl "https://harismlnaslm-Textilindo-AI.hf.space/train/data" | |
| ``` | |
| ## β οΈ Limitations of HF Spaces Training | |
| ### Resource Constraints: | |
| - **CPU Only**: No GPU acceleration | |
| - **Memory**: Limited to ~4GB RAM | |
| - **Time**: 5-minute timeout for training | |
| - **Storage**: Limited disk space | |
| ### Recommended Approach: | |
| 1. **Quick Demo Training**: Use `quick_train.py` for testing | |
| 2. **Full Training**: Use external resources (Google Colab, AWS, etc.) | |
| 3. **Model Upload**: Upload pre-trained models to HF Hub | |
| ## π External Training Options | |
| ### Google Colab (Free GPU): | |
| ```python | |
| # Upload your training data | |
| # Run: python scripts/train_textilindo_ai.py | |
| # Download trained model | |
| # Upload to HF Hub | |
| ``` | |
| ### Local Training: | |
| ```bash | |
| # Setup environment | |
| python scripts/setup_textilindo_training.py | |
| # Download model | |
| python scripts/download_model.py | |
| # Run training | |
| python scripts/train_textilindo_ai.py | |
| # Test model | |
| python scripts/test_textilindo_ai.py | |
| ``` | |
| ### Cloud Training (AWS/GCP): | |
| ```bash | |
| # Use the monitoring script | |
| python scripts/train_with_monitoring.py | |
| ``` | |
| ## π Training Progress Monitoring | |
| ### On HF Spaces: | |
| - Check the training log in the web interface | |
| - Use `/train/status` endpoint for resource monitoring | |
| ### External Training: | |
| ```bash | |
| # Use monitoring script | |
| python scripts/train_with_monitoring.py | |
| # Check logs | |
| tail -f logs/training.log | |
| ``` | |
| ## π§ͺ Testing Trained Models | |
| ### Quick Test: | |
| ```bash | |
| python quick_train.py # Includes testing | |
| ``` | |
| ### Full Testing: | |
| ```bash | |
| python scripts/test_textilindo_ai.py | |
| python scripts/test_model.py | |
| ``` | |
| ### API Testing: | |
| ```bash | |
| # Test chat endpoint | |
| curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/chat" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"message": "dimana lokasi textilindo?"}' | |
| ``` | |
| ## π§ Troubleshooting | |
| ### Common Issues: | |
| 1. **"Out of Memory"** | |
| - Use smaller models (distilgpt2) | |
| - Reduce batch size | |
| - Use external training | |
| 2. **"Training Timeout"** | |
| - HF Spaces has 5-minute limit | |
| - Use external resources for full training | |
| 3. **"Model Not Found"** | |
| - Check if model is downloaded | |
| - Use `python scripts/download_model.py` | |
| 4. **"Data Not Found"** | |
| - Verify data files exist in `data/` directory | |
| - Check file permissions | |
| ## π Next Steps | |
| 1. **Start with Quick Training**: Test the setup with `quick_train.py` | |
| 2. **Monitor Resources**: Use `/train/status` to check available resources | |
| 3. **External Training**: For full training, use external resources | |
| 4. **Model Upload**: Upload trained models to Hugging Face Hub | |
| 5. **Integration**: Use uploaded models in your space | |
| ## π Success Indicators | |
| - β Training completes without errors | |
| - β Model saves to `./models/` directory | |
| - β Test responses are generated | |
| - β Chat interface works with trained model | |
| - β API endpoints respond correctly | |
| --- | |
| *Happy Training! π* | |