Textilindo-AI / TRAINING_GUIDE.md
harismlnaslm's picture
Add pure API-based training system with GPU support and background processing
701eb48

πŸ€– Textilindo AI Training Guide for Hugging Face Spaces

πŸš€ Training Options on Hugging Face Spaces

Option 1: Quick Training (Recommended for HF Spaces)

Use the lightweight training script designed for HF Spaces constraints.

Access Training Interface:

  • Visit: https://harismlnaslm-Textilindo-AI.hf.space/train
  • Click "Start Lightweight Training"
  • Monitor progress in the training log

Manual Training:

python quick_train.py

Option 2: Use Existing Scripts

Run the full training scripts (may be resource-intensive):

# Check if training is ready
python scripts/check_training_ready.py

# Run lightweight training
python scripts/train_textilindo_ai_optimized.py

# Test the trained model
python scripts/test_textilindo_ai.py

Option 3: External Training + Upload

Train on external resources and upload the model:

  1. Train locally or on cloud:

    python scripts/train_textilindo_ai.py
    
  2. Upload trained model to HF Hub:

    huggingface-cli upload your-username/textilindo-trained-model ./models/trained-model
    
  3. Use the uploaded model in your space

πŸ”§ Training Configuration

For HF Spaces (Limited Resources):

  • Model: distilgpt2 (small, fast)
  • Batch Size: 1
  • Epochs: 1
  • Max Length: 128 tokens
  • Training Time: ~5 minutes

For External Training (Full Resources):

  • Model: meta-llama/Llama-3.1-8B-Instruct
  • Batch Size: 4-8
  • Epochs: 3
  • Max Length: 2048 tokens
  • Training Time: Hours

πŸ“Š Training Data

Your space includes these training datasets:

  • data/lora_dataset_20250829_113330.jsonl (33 samples)
  • data/lora_dataset_20250910_145055.jsonl
  • data/textilindo_training_data.jsonl
  • data/training_data.jsonl

🎯 Training Endpoints

Web Interface:

  • Training UI: /train
  • Start Training: POST /train/start
  • Check Status: GET /train/status
  • View Data: GET /train/data

API Usage:

# Start training
curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/train/start"

# Check resources
curl "https://harismlnaslm-Textilindo-AI.hf.space/train/status"

# View training data
curl "https://harismlnaslm-Textilindo-AI.hf.space/train/data"

⚠️ Limitations of HF Spaces Training

Resource Constraints:

  • CPU Only: No GPU acceleration
  • Memory: Limited to ~4GB RAM
  • Time: 5-minute timeout for training
  • Storage: Limited disk space

Recommended Approach:

  1. Quick Demo Training: Use quick_train.py for testing
  2. Full Training: Use external resources (Google Colab, AWS, etc.)
  3. Model Upload: Upload pre-trained models to HF Hub

πŸš€ External Training Options

Google Colab (Free GPU):

# Upload your training data
# Run: python scripts/train_textilindo_ai.py
# Download trained model
# Upload to HF Hub

Local Training:

# Setup environment
python scripts/setup_textilindo_training.py

# Download model
python scripts/download_model.py

# Run training
python scripts/train_textilindo_ai.py

# Test model
python scripts/test_textilindo_ai.py

Cloud Training (AWS/GCP):

# Use the monitoring script
python scripts/train_with_monitoring.py

πŸ“ˆ Training Progress Monitoring

On HF Spaces:

  • Check the training log in the web interface
  • Use /train/status endpoint for resource monitoring

External Training:

# Use monitoring script
python scripts/train_with_monitoring.py

# Check logs
tail -f logs/training.log

πŸ§ͺ Testing Trained Models

Quick Test:

python quick_train.py  # Includes testing

Full Testing:

python scripts/test_textilindo_ai.py
python scripts/test_model.py

API Testing:

# Test chat endpoint
curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/chat" \
  -H "Content-Type: application/json" \
  -d '{"message": "dimana lokasi textilindo?"}'

πŸ”§ Troubleshooting

Common Issues:

  1. "Out of Memory"

    • Use smaller models (distilgpt2)
    • Reduce batch size
    • Use external training
  2. "Training Timeout"

    • HF Spaces has 5-minute limit
    • Use external resources for full training
  3. "Model Not Found"

    • Check if model is downloaded
    • Use python scripts/download_model.py
  4. "Data Not Found"

    • Verify data files exist in data/ directory
    • Check file permissions

πŸ“š Next Steps

  1. Start with Quick Training: Test the setup with quick_train.py
  2. Monitor Resources: Use /train/status to check available resources
  3. External Training: For full training, use external resources
  4. Model Upload: Upload trained models to Hugging Face Hub
  5. Integration: Use uploaded models in your space

πŸŽ‰ Success Indicators

  • βœ… Training completes without errors
  • βœ… Model saves to ./models/ directory
  • βœ… Test responses are generated
  • βœ… Chat interface works with trained model
  • βœ… API endpoints respond correctly

Happy Training! πŸš€