Spaces:

harismlnaslm
/

Textilindo-AI

Sleeping

App Files Files Community

Textilindo-AI / TRAINING_GUIDE.md

harismlnaslm

Add pure API-based training system with GPU support and background processing

701eb48 2 months ago

preview code

raw

history blame contribute delete

5.06 kB

	# 🤖 Textilindo AI Training Guide for Hugging Face Spaces

	## 🚀 Training Options on Hugging Face Spaces

	### Option 1: Quick Training (Recommended for HF Spaces)
	Use the lightweight training script designed for HF Spaces constraints.

	Access Training Interface:
	- Visit: `https://harismlnaslm-Textilindo-AI.hf.space/train`
	- Click "Start Lightweight Training"
	- Monitor progress in the training log

	Manual Training:
	```bash
	python quick_train.py
	```

	### Option 2: Use Existing Scripts
	Run the full training scripts (may be resource-intensive):

	```bash
	# Check if training is ready
	python scripts/check_training_ready.py

	# Run lightweight training
	python scripts/train_textilindo_ai_optimized.py

	# Test the trained model
	python scripts/test_textilindo_ai.py
	```

	### Option 3: External Training + Upload
	Train on external resources and upload the model:

	1. Train locally or on cloud:
	```bash
	python scripts/train_textilindo_ai.py
	```

	2. Upload trained model to HF Hub:
	```bash
	huggingface-cli upload your-username/textilindo-trained-model ./models/trained-model
	```

	3. Use the uploaded model in your space

	## 🔧 Training Configuration

	### For HF Spaces (Limited Resources):
	- Model: `distilgpt2` (small, fast)
	- Batch Size: 1
	- Epochs: 1
	- Max Length: 128 tokens
	- Training Time: ~5 minutes

	### For External Training (Full Resources):
	- Model: `meta-llama/Llama-3.1-8B-Instruct`
	- Batch Size: 4-8
	- Epochs: 3
	- Max Length: 2048 tokens
	- Training Time: Hours

	## 📊 Training Data

	Your space includes these training datasets:
	- `data/lora_dataset_20250829_113330.jsonl` (33 samples)
	- `data/lora_dataset_20250910_145055.jsonl`
	- `data/textilindo_training_data.jsonl`
	- `data/training_data.jsonl`

	## 🎯 Training Endpoints

	### Web Interface:
	- Training UI: `/train`
	- Start Training: `POST /train/start`
	- Check Status: `GET /train/status`
	- View Data: `GET /train/data`

	### API Usage:
	```bash
	# Start training
	curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/train/start"

	# Check resources
	curl "https://harismlnaslm-Textilindo-AI.hf.space/train/status"

	# View training data
	curl "https://harismlnaslm-Textilindo-AI.hf.space/train/data"
	```

	## ⚠️ Limitations of HF Spaces Training

	### Resource Constraints:
	- CPU Only: No GPU acceleration
	- Memory: Limited to ~4GB RAM
	- Time: 5-minute timeout for training
	- Storage: Limited disk space

	### Recommended Approach:
	1. Quick Demo Training: Use `quick_train.py` for testing
	2. Full Training: Use external resources (Google Colab, AWS, etc.)
	3. Model Upload: Upload pre-trained models to HF Hub

	## 🚀 External Training Options

	### Google Colab (Free GPU):
	```python
	# Upload your training data
	# Run: python scripts/train_textilindo_ai.py
	# Download trained model
	# Upload to HF Hub
	```

	### Local Training:
	```bash
	# Setup environment
	python scripts/setup_textilindo_training.py

	# Download model
	python scripts/download_model.py

	# Run training
	python scripts/train_textilindo_ai.py

	# Test model
	python scripts/test_textilindo_ai.py
	```

	### Cloud Training (AWS/GCP):
	```bash
	# Use the monitoring script
	python scripts/train_with_monitoring.py
	```

	## 📈 Training Progress Monitoring

	### On HF Spaces:
	- Check the training log in the web interface
	- Use `/train/status` endpoint for resource monitoring

	### External Training:
	```bash
	# Use monitoring script
	python scripts/train_with_monitoring.py

	# Check logs
	tail -f logs/training.log
	```

	## 🧪 Testing Trained Models

	### Quick Test:
	```bash
	python quick_train.py # Includes testing
	```

	### Full Testing:
	```bash
	python scripts/test_textilindo_ai.py
	python scripts/test_model.py
	```

	### API Testing:
	```bash
	# Test chat endpoint
	curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/chat" \
	-H "Content-Type: application/json" \
	-d '{"message": "dimana lokasi textilindo?"}'
	```

	## 🔧 Troubleshooting

	### Common Issues:

	1. "Out of Memory"
	- Use smaller models (distilgpt2)
	- Reduce batch size
	- Use external training

	2. "Training Timeout"
	- HF Spaces has 5-minute limit
	- Use external resources for full training

	3. "Model Not Found"
	- Check if model is downloaded
	- Use `python scripts/download_model.py`

	4. "Data Not Found"
	- Verify data files exist in `data/` directory
	- Check file permissions

	## 📚 Next Steps

	1. Start with Quick Training: Test the setup with `quick_train.py`
	2. Monitor Resources: Use `/train/status` to check available resources
	3. External Training: For full training, use external resources
	4. Model Upload: Upload trained models to Hugging Face Hub
	5. Integration: Use uploaded models in your space

	## 🎉 Success Indicators

	- ✅ Training completes without errors
	- ✅ Model saves to `./models/` directory
	- ✅ Test responses are generated
	- ✅ Chat interface works with trained model
	- ✅ API endpoints respond correctly

	---

	Happy Training! 🚀