# Full Pipeline — Gradio API Reference Gradio automatically exposes every event handler as an HTTP endpoint. This document covers the endpoints useful for programmatic integration, with examples in Python, JavaScript, and curl. --- ## Base URL ``` https://garchenarchive-archiveai.hf.space ``` When running locally the base URL is `http://localhost:7860`. ### Auto-generated schema Every running Gradio app publishes its full schema at: ``` GET {base_url}/info ``` You can also browse interactive API docs in the Gradio UI by clicking **"Use via API"** in the footer, or visiting `{base_url}/docs`. --- ## Client libraries (recommended) Prefer the official Gradio clients over raw HTTP — they handle file uploads, streaming, and session state automatically. **Python** ```bash pip install gradio_client ``` **JavaScript / Node** ```bash npm install @gradio/client ``` --- ## Authentication This app does **not** use HTTP-level authentication. The Gemini API key is passed as a plain parameter on each call that requires it (`gemini_api_key`). Keep it out of client-side code — proxy calls through your backend. --- ## Endpoints ### 1. Run the full pipeline Chains Speech-to-Text → Translation → Text-to-Speech in one call. **API name:** `/run_pipeline` #### Inputs | # | Parameter | Type | Description | |---|-----------|------|-------------| | 0 | `file_input` | `filepath \| null` | Audio, video, `.srt`, `.txt`, or `.json` file | | 1 | `drive_url` | `string` | Google Drive share URL (alternative to file upload) | | 2 | `do_stt` | `boolean` | Run Speech-to-Text stage | | 3 | `do_translation` | `boolean` | Run Translation stage | | 4 | `do_tts` | `boolean` | Run Text-to-Speech stage | | 5 | `do_summary` | `boolean` | Generate a Gemini summary | | 6 | `language` | `string` | STT language — `"English"`, `"Tibetan"`, `"Tibetan (Base)"`, or `"Both"` | | 7 | `selected_speakers` | `string[]` | Speaker names to keep (empty = all speakers) | | 8 | `speaker_threshold` | `float` | Speaker similarity threshold, 0.0–1.0 (default `0.5`) | | 9 | `use_gemini_post_edit` | `boolean` | Correct transcription via Gemini — works on STT output and uploaded transcripts | | 10 | `gemini_model` | `string` | Gemini model name (see [Models](#models)) | | 11 | `min_clip_duration` | `float` | Minimum segment duration in seconds (default `3`); shorter segments are not split further | | 12 | `max_clip_duration` | `float` | Maximum segment duration in seconds (default `30`); longer segments are split into chunks | | 13 | `target_language` | `string` | Translation target language, e.g. `"English"`, `"French"` | | 14 | `gemini_api_key` | `string` | Gemini API key | | 15 | `voice_label` | `string` | TTS voice (see [Voices](#voices)) | | 16 | `prose_speed` | `float` | Prose playback speed, 0.5–1.0 (default `1.0`) | | 17 | `mantra_speed` | `float` | Mantra playback speed, 0.5–1.0 (default `0.75`) | | 18 | `state` | `object` | Pass `null` to start fresh | #### Outputs The pipeline is a **streaming generator** — it yields intermediate status updates before the final result. The last yielded value contains: | Field | Type | Description | |-------|------|-------------| | `state` | `object` | Updated app state including all segments | | `status` | `string` | Human-readable status message | | `srt_download` | `filepath \| null` | Path to generated SRT file | | `json_download` | `filepath \| null` | Path to generated JSON file | | `summary` | `string` | Summary text (if requested) | | `audio` | `[sample_rate, array] \| null` | Synthesized audio (if TTS enabled) | #### Python example ```python from gradio_client import Client, handle_file client = Client("https://garchenarchive-archiveai.hf.space") result = client.predict( file_input=handle_file("/path/to/audio.mp3"), drive_url="", do_stt=True, do_translation=True, do_tts=False, do_summary=False, language="Both", selected_speakers=[], speaker_threshold=0.5, use_gemini_post_edit=False, gemini_model="gemini-2.5-flash", min_clip_duration=3, max_clip_duration=30, target_language="English", gemini_api_key="AIza...", voice_label="Female: Sarah", prose_speed=1.0, mantra_speed=0.75, state=None, api_name="/run_pipeline", ) ``` #### JavaScript example ```js import { Client } from "@gradio/client"; const client = await Client.connect("https://garchenarchive-archiveai.hf.space"); const result = await client.predict("/run_pipeline", { file_input: new Blob([audioBuffer], { type: "audio/mpeg" }), drive_url: "", do_stt: true, do_translation: true, do_tts: false, do_summary: false, language: "Both", selected_speakers: [], speaker_threshold: 0.5, use_gemini_post_edit: false, gemini_model: "gemini-2.5-flash", min_clip_duration: 3, max_clip_duration: 30, target_language: "English", gemini_api_key: "AIza...", voice_label: "Female: Sarah", prose_speed: 1.0, mantra_speed: 0.75, state: null, }); ``` --- ### 2. Translate a single segment Translates one piece of text and returns the translation. The simplest endpoint for one-off translation calls. **API name:** `/translate_one_segment` #### Inputs | # | Parameter | Type | Description | |---|-----------|------|-------------| | 0 | `source` | `string` | Source text to translate | | 1 | `target_lang` | `string` | Target language, e.g. `"English"` | | 2 | `api_key` | `string` | Gemini API key | | 3 | `gemini_model` | `string` | Gemini model name | #### Output `string` — the translated text, or `[Translation error: ...]` on failure. #### Python example ```python translation = client.predict( source="རང་གི་སེམས་ལ་བལྟ་བར་གྱིས།", target_lang="English", api_key="AIza...", gemini_model="gemini-2.5-flash", api_name="/translate_one_segment", ) ``` #### curl example ```bash curl -X POST https://garchenarchive-archiveai.hf.space/run/predict \ -H "Content-Type: application/json" \ -d '{ "fn_index": , "data": [ "རང་གི་སེམས་ལ་བལྟ་བར་གྱིས།", "English", "AIza...", "gemini-2.5-flash" ] }' ``` > **Note:** For raw HTTP calls, look up the correct `fn_index` from `GET /info` — the index is assigned at startup and depends on event registration order. --- ### 3. Translate all segments Translates all segments currently in the app state and refreshes the segment editor. **API name:** `/translate_all_segments` This is primarily used by the UI. For programmatic use, calling `/translate_one_segment` in a loop gives you finer error handling per segment. --- ### 4. Synthesize audio Runs TTS on the segments currently in state. **API name:** `/handle_synthesize` #### Inputs | # | Parameter | Type | Description | |---|-----------|------|-------------| | 0 | `state` | `object` | App state containing segments | | 1 | `voice_label` | `string` | Voice name (see [Voices](#voices)) | | 2 | `prose_speed` | `float` | 0.5–1.0 | | 3 | `mantra_speed` | `float` | 0.5–1.0 | | 4–28 | `slot_sources[0..24]` | `string` | Source text for each visible slot | | 29–53 | `slot_targets[0..24]` | `string` | Target text for each visible slot | #### Output `[sample_rate: int, samples: float[]]` — NumPy-style audio array as returned by Gradio's `gr.Audio(type="numpy")`. --- ### 5. Pronunciation editor Three endpoints for managing the TTS pronunciation glossary. #### Look up a word **API name:** `/lookup_word` Input: `word: string` Output: `pronunciation: string` (empty if not found) #### Save a pronunciation **API name:** `/save_pronunciation` Inputs: `word: string`, `pronunciation: string` Output: `status: string` #### Remove a pronunciation **API name:** `/remove_pronunciation` Input: `word: string` Output: `[pronunciation: string, status: string]` --- ## Reference ### Models | Value | Notes | |-------|-------| | `gemini-2.5-flash` | Default — fast, good quality | | `gemini-2.5-pro` | Higher quality, slower | | `gemini-3-flash-preview` | Preview | | `gemini-3.1-pro-preview` | Preview | The app automatically falls back through `gemini-2.5-flash → gemini-2.5-pro` if the requested model fails. ### Voices | Value | Voice | |-------|-------| | `Female: Sarah` | `af_sarah` | | `Female: Heart` | `af_heart` | | `Female: Alice` | `bf_alice` | | `Female: Emma` | `bf_emma` | | `Male: Adam` | `am_adam` | | `Male: Onyx` | `am_onyx` | | `Male: Daniel` | `bm_daniel` | | `Male: George` | `bm_george` | ### Segment object schema Segments are the core data structure passed through the pipeline: ```json { "source": "Original transcription text", "target": "Translated text (empty string if not yet translated)", "timestamp": "00:00:01,000 --> 00:00:05,000" } ``` `timestamp` follows SRT format and may be an empty string for plain-text inputs. ### Supported input file types | Extension | Handled as | |-----------|------------| | `.mp3`, `.wav`, `.m4a`, `.mp4`, `.mov`, etc. | Audio/video — passed to STT | | `.srt` | Subtitle file — parsed into segments | | `.txt` | Plain text — each line becomes a segment | | `.json` | Segment array — must match the segment object schema above |