Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.12.0
Full Pipeline — Gradio API Reference
Gradio automatically exposes every event handler as an HTTP endpoint. This document covers the endpoints useful for programmatic integration, with examples in Python, JavaScript, and curl.
Base URL
https://garchenarchive-archiveai.hf.space
When running locally the base URL is http://localhost:7860.
Auto-generated schema
Every running Gradio app publishes its full schema at:
GET {base_url}/info
You can also browse interactive API docs in the Gradio UI by clicking "Use via API" in the footer, or visiting {base_url}/docs.
Client libraries (recommended)
Prefer the official Gradio clients over raw HTTP — they handle file uploads, streaming, and session state automatically.
Python
pip install gradio_client
JavaScript / Node
npm install @gradio/client
Authentication
This app does not use HTTP-level authentication. The Gemini API key is passed as a plain parameter on each call that requires it (gemini_api_key). Keep it out of client-side code — proxy calls through your backend.
Endpoints
1. Run the full pipeline
Chains Speech-to-Text → Translation → Text-to-Speech in one call.
API name: /run_pipeline
Inputs
| # | Parameter | Type | Description |
|---|---|---|---|
| 0 | file_input |
filepath | null |
Audio, video, .srt, .txt, or .json file |
| 1 | drive_url |
string |
Google Drive share URL (alternative to file upload) |
| 2 | do_stt |
boolean |
Run Speech-to-Text stage |
| 3 | do_translation |
boolean |
Run Translation stage |
| 4 | do_tts |
boolean |
Run Text-to-Speech stage |
| 5 | do_summary |
boolean |
Generate a Gemini summary |
| 6 | language |
string |
STT language — "English", "Tibetan", "Tibetan (Base)", or "Both" |
| 7 | selected_speakers |
string[] |
Speaker names to keep (empty = all speakers) |
| 8 | speaker_threshold |
float |
Speaker similarity threshold, 0.0–1.0 (default 0.5) |
| 9 | use_gemini_post_edit |
boolean |
Correct transcription via Gemini — works on STT output and uploaded transcripts |
| 10 | gemini_model |
string |
Gemini model name (see Models) |
| 11 | min_clip_duration |
float |
Minimum segment duration in seconds (default 3); shorter segments are not split further |
| 12 | max_clip_duration |
float |
Maximum segment duration in seconds (default 30); longer segments are split into chunks |
| 13 | target_language |
string |
Translation target language, e.g. "English", "French" |
| 14 | gemini_api_key |
string |
Gemini API key |
| 15 | voice_label |
string |
TTS voice (see Voices) |
| 16 | prose_speed |
float |
Prose playback speed, 0.5–1.0 (default 1.0) |
| 17 | mantra_speed |
float |
Mantra playback speed, 0.5–1.0 (default 0.75) |
| 18 | state |
object |
Pass null to start fresh |
Outputs
The pipeline is a streaming generator — it yields intermediate status updates before the final result. The last yielded value contains:
| Field | Type | Description |
|---|---|---|
state |
object |
Updated app state including all segments |
status |
string |
Human-readable status message |
srt_download |
filepath | null |
Path to generated SRT file |
json_download |
filepath | null |
Path to generated JSON file |
summary |
string |
Summary text (if requested) |
audio |
[sample_rate, array] | null |
Synthesized audio (if TTS enabled) |
Python example
from gradio_client import Client, handle_file
client = Client("https://garchenarchive-archiveai.hf.space")
result = client.predict(
file_input=handle_file("/path/to/audio.mp3"),
drive_url="",
do_stt=True,
do_translation=True,
do_tts=False,
do_summary=False,
language="Both",
selected_speakers=[],
speaker_threshold=0.5,
use_gemini_post_edit=False,
gemini_model="gemini-2.5-flash",
min_clip_duration=3,
max_clip_duration=30,
target_language="English",
gemini_api_key="AIza...",
voice_label="Female: Sarah",
prose_speed=1.0,
mantra_speed=0.75,
state=None,
api_name="/run_pipeline",
)
JavaScript example
import { Client } from "@gradio/client";
const client = await Client.connect("https://garchenarchive-archiveai.hf.space");
const result = await client.predict("/run_pipeline", {
file_input: new Blob([audioBuffer], { type: "audio/mpeg" }),
drive_url: "",
do_stt: true,
do_translation: true,
do_tts: false,
do_summary: false,
language: "Both",
selected_speakers: [],
speaker_threshold: 0.5,
use_gemini_post_edit: false,
gemini_model: "gemini-2.5-flash",
min_clip_duration: 3,
max_clip_duration: 30,
target_language: "English",
gemini_api_key: "AIza...",
voice_label: "Female: Sarah",
prose_speed: 1.0,
mantra_speed: 0.75,
state: null,
});
2. Translate a single segment
Translates one piece of text and returns the translation. The simplest endpoint for one-off translation calls.
API name: /translate_one_segment
Inputs
| # | Parameter | Type | Description |
|---|---|---|---|
| 0 | source |
string |
Source text to translate |
| 1 | target_lang |
string |
Target language, e.g. "English" |
| 2 | api_key |
string |
Gemini API key |
| 3 | gemini_model |
string |
Gemini model name |
Output
string — the translated text, or [Translation error: ...] on failure.
Python example
translation = client.predict(
source="རང་གི་སེམས་ལ་བལྟ་བར་གྱིས།",
target_lang="English",
api_key="AIza...",
gemini_model="gemini-2.5-flash",
api_name="/translate_one_segment",
)
curl example
curl -X POST https://garchenarchive-archiveai.hf.space/run/predict \
-H "Content-Type: application/json" \
-d '{
"fn_index": <fn_index>,
"data": [
"རང་གི་སེམས་ལ་བལྟ་བར་གྱིས།",
"English",
"AIza...",
"gemini-2.5-flash"
]
}'
Note: For raw HTTP calls, look up the correct
fn_indexfromGET /info— the index is assigned at startup and depends on event registration order.
3. Translate all segments
Translates all segments currently in the app state and refreshes the segment editor.
API name: /translate_all_segments
This is primarily used by the UI. For programmatic use, calling /translate_one_segment in a loop gives you finer error handling per segment.
4. Synthesize audio
Runs TTS on the segments currently in state.
API name: /handle_synthesize
Inputs
| # | Parameter | Type | Description |
|---|---|---|---|
| 0 | state |
object |
App state containing segments |
| 1 | voice_label |
string |
Voice name (see Voices) |
| 2 | prose_speed |
float |
0.5–1.0 |
| 3 | mantra_speed |
float |
0.5–1.0 |
| 4–28 | slot_sources[0..24] |
string |
Source text for each visible slot |
| 29–53 | slot_targets[0..24] |
string |
Target text for each visible slot |
Output
[sample_rate: int, samples: float[]] — NumPy-style audio array as returned by Gradio's gr.Audio(type="numpy").
5. Pronunciation editor
Three endpoints for managing the TTS pronunciation glossary.
Look up a word
API name: /lookup_word
Input: word: string
Output: pronunciation: string (empty if not found)
Save a pronunciation
API name: /save_pronunciation
Inputs: word: string, pronunciation: string
Output: status: string
Remove a pronunciation
API name: /remove_pronunciation
Input: word: string
Output: [pronunciation: string, status: string]
Reference
Models
| Value | Notes |
|---|---|
gemini-2.5-flash |
Default — fast, good quality |
gemini-2.5-pro |
Higher quality, slower |
gemini-3-flash-preview |
Preview |
gemini-3.1-pro-preview |
Preview |
The app automatically falls back through gemini-2.5-flash → gemini-2.5-pro if the requested model fails.
Voices
| Value | Voice |
|---|---|
Female: Sarah |
af_sarah |
Female: Heart |
af_heart |
Female: Alice |
bf_alice |
Female: Emma |
bf_emma |
Male: Adam |
am_adam |
Male: Onyx |
am_onyx |
Male: Daniel |
bm_daniel |
Male: George |
bm_george |
Segment object schema
Segments are the core data structure passed through the pipeline:
{
"source": "Original transcription text",
"target": "Translated text (empty string if not yet translated)",
"timestamp": "00:00:01,000 --> 00:00:05,000"
}
timestamp follows SRT format and may be an empty string for plain-text inputs.
Supported input file types
| Extension | Handled as |
|---|---|
.mp3, .wav, .m4a, .mp4, .mov, etc. |
Audio/video — passed to STT |
.srt |
Subtitle file — parsed into segments |
.txt |
Plain text — each line becomes a segment |
.json |
Segment array — must match the segment object schema above |