# ๐Ÿค– Textilindo AI Training API Documentation ## ๐Ÿš€ Pure API-Based Training System This is a complete API-based training system that uses your data, configs, and the free GPU tier on Hugging Face Spaces. ## ๐Ÿ“ก API Endpoints ### 1. **Start Training** ```bash POST /api/train/start ``` **Request Body:** ```json { "model_name": "distilgpt2", "dataset_path": "data/lora_dataset_20250829_113330.jsonl", "config_path": "configs/training_config.yaml", "max_samples": 10, "epochs": 1, "batch_size": 1, "learning_rate": 5e-5 } ``` **Response:** ```json { "success": true, "message": "Training started successfully", "training_id": "train_20241025_120000", "status": "started" } ``` ### 2. **Check Training Status** ```bash GET /api/train/status ``` **Response:** ```json { "is_training": true, "progress": 45, "status": "training", "current_step": 5, "total_steps": 10, "loss": 2.34, "start_time": "2024-10-25T12:00:00", "error": null } ``` ### 3. **Get Training Data Info** ```bash GET /api/train/data ``` **Response:** ```json { "files": [ { "name": "lora_dataset_20250829_113330.jsonl", "size": 12345, "lines": 33 } ], "count": 4 } ``` ### 4. **Check GPU Availability** ```bash GET /api/train/gpu ``` **Response:** ```json { "available": true, "count": 1, "name": "Tesla T4", "memory_gb": 15.0 } ``` ### 5. **Test Trained Model** ```bash POST /api/train/test ``` **Response:** ```json { "success": true, "test_prompt": "Question: dimana lokasi textilindo? Answer:", "response": "Question: dimana lokasi textilindo? Answer: Textilindo berkantor pusat di Jl. Raya Prancis No.39, Kosambi Tim., Kec. Kosambi, Kabupaten Tangerang, Banten 15213", "model_path": "./models/textilindo-trained" } ``` ## ๐Ÿงช Testing the API ### 1. **Check GPU Availability** ```bash curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/gpu" ``` ### 2. **View Training Data** ```bash curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/data" ``` ### 3. **Start Training** ```bash curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/start" \ -H "Content-Type: application/json" \ -d '{ "model_name": "distilgpt2", "dataset_path": "data/lora_dataset_20250829_113330.jsonl", "config_path": "configs/training_config.yaml", "max_samples": 10, "epochs": 1, "batch_size": 1, "learning_rate": 5e-5 }' ``` ### 4. **Monitor Training Progress** ```bash curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/status" ``` ### 5. **Test Trained Model** ```bash curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/test" ``` ## ๐Ÿ”ง Training Configuration ### Available Models: - `distilgpt2` (82M) - Small, fast, good for free tier - `gpt2` (124M) - Original GPT-2 - `microsoft/DialoGPT-small` (117M) - Conversational ### Training Parameters: - **max_samples**: Limit training data (10 for free tier) - **epochs**: Number of training epochs (1-3 recommended) - **batch_size**: Batch size (1 for free tier) - **learning_rate**: Learning rate (5e-5 recommended) ## ๐ŸŽฏ Training Process 1. **Start Training**: POST to `/api/train/start` 2. **Monitor Progress**: GET `/api/train/status` 3. **Check GPU Usage**: GET `/api/train/gpu` 4. **Test Model**: POST `/api/train/test` ## ๐Ÿ“Š Training Status Values - `idle` - No training - `starting` - Training initialization - `training` - Active training - `completed` - Training finished - `failed` - Training error - `stopped` - Training stopped ## โšก GPU Usage The API automatically detects and uses GPU if available: - **GPU Available**: Uses GPU with fp16 precision - **CPU Only**: Falls back to CPU training - **Memory Optimization**: Adjusts batch size based on available memory ## ๐Ÿ” Error Handling ### Common Errors: - `400` - Training already in progress - `404` - Dataset or config file not found - `500` - Training failed (check logs) ### Error Response: ```json { "detail": "Training already in progress" } ``` ## ๐Ÿ“ˆ Training Monitoring ### Real-time Status: - **Progress**: 0-100% - **Current Step**: Current training step - **Total Steps**: Total training steps - **Loss**: Current training loss - **GPU Usage**: GPU memory and utilization ### Training Logs: Check the space logs for detailed training information. ## ๐Ÿš€ Quick Start Example ```bash # 1. Check GPU curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/gpu" # 2. Start training curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/start" \ -H "Content-Type: application/json" \ -d '{ "model_name": "distilgpt2", "dataset_path": "data/lora_dataset_20250829_113330.jsonl", "max_samples": 5, "epochs": 1 }' # 3. Monitor progress curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/status" # 4. Test when complete curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/test" ``` ## ๐ŸŽ‰ Success Indicators - โœ… Training starts without errors - โœ… GPU is detected and used - โœ… Progress increases over time - โœ… Model saves to `./models/textilindo-trained` - โœ… Test endpoint returns valid responses - โœ… Chat interface works with trained model --- *Pure API training system - No HTML interfaces! ๐Ÿš€*