# Full Pipeline — Gradio API Reference

Gradio automatically exposes every event handler as an HTTP endpoint. This document covers the endpoints useful for programmatic integration, with examples in Python, JavaScript, and curl.

---

## Base URL

```
https://garchenarchive-archiveai.hf.space
```

When running locally the base URL is `http://localhost:7860`.

### Auto-generated schema

Every running Gradio app publishes its full schema at:

```
GET {base_url}/info
```

You can also browse interactive API docs in the Gradio UI by clicking **"Use via API"** in the footer, or visiting `{base_url}/docs`.

---

## Client libraries (recommended)

Prefer the official Gradio clients over raw HTTP — they handle file uploads, streaming, and session state automatically.

**Python**
```bash
pip install gradio_client
```

**JavaScript / Node**
```bash
npm install @gradio/client
```

---

## Authentication

This app does **not** use HTTP-level authentication. The Gemini API key is passed as a plain parameter on each call that requires it (`gemini_api_key`). Keep it out of client-side code — proxy calls through your backend.

---

## Endpoints

### 1. Run the full pipeline

Chains Speech-to-Text → Translation → Text-to-Speech in one call.

**API name:** `/run_pipeline`

#### Inputs

| # | Parameter | Type | Description |
|---|-----------|------|-------------|
| 0 | `file_input` | `filepath \| null` | Audio, video, `.srt`, `.txt`, or `.json` file |
| 1 | `drive_url` | `string` | Google Drive share URL (alternative to file upload) |
| 2 | `do_stt` | `boolean` | Run Speech-to-Text stage |
| 3 | `do_translation` | `boolean` | Run Translation stage |
| 4 | `do_tts` | `boolean` | Run Text-to-Speech stage |
| 5 | `do_summary` | `boolean` | Generate a Gemini summary |
| 6 | `language` | `string` | STT language — `"English"`, `"Tibetan"`, `"Tibetan (Base)"`, or `"Both"` |
| 7 | `selected_speakers` | `string[]` | Speaker names to keep (empty = all speakers) |
| 8 | `speaker_threshold` | `float` | Speaker similarity threshold, 0.0–1.0 (default `0.5`) |
| 9 | `use_gemini_post_edit` | `boolean` | Correct transcription via Gemini — works on STT output and uploaded transcripts |
| 10 | `gemini_model` | `string` | Gemini model name (see [Models](#models)) |
| 11 | `min_clip_duration` | `float` | Minimum segment duration in seconds (default `3`); shorter segments are not split further |
| 12 | `max_clip_duration` | `float` | Maximum segment duration in seconds (default `30`); longer segments are split into chunks |
| 13 | `target_language` | `string` | Translation target language, e.g. `"English"`, `"French"` |
| 14 | `gemini_api_key` | `string` | Gemini API key |
| 15 | `voice_label` | `string` | TTS voice (see [Voices](#voices)) |
| 16 | `prose_speed` | `float` | Prose playback speed, 0.5–1.0 (default `1.0`) |
| 17 | `mantra_speed` | `float` | Mantra playback speed, 0.5–1.0 (default `0.75`) |
| 18 | `state` | `object` | Pass `null` to start fresh |

#### Outputs

The pipeline is a **streaming generator** — it yields intermediate status updates before the final result. The last yielded value contains:

| Field | Type | Description |
|-------|------|-------------|
| `state` | `object` | Updated app state including all segments |
| `status` | `string` | Human-readable status message |
| `srt_download` | `filepath \| null` | Path to generated SRT file |
| `json_download` | `filepath \| null` | Path to generated JSON file |
| `summary` | `string` | Summary text (if requested) |
| `audio` | `[sample_rate, array] \| null` | Synthesized audio (if TTS enabled) |

#### Python example

```python
from gradio_client import Client, handle_file

client = Client("https://garchenarchive-archiveai.hf.space")

result = client.predict(
    file_input=handle_file("/path/to/audio.mp3"),
    drive_url="",
    do_stt=True,
    do_translation=True,
    do_tts=False,
    do_summary=False,
    language="Both",
    selected_speakers=[],
    speaker_threshold=0.5,
    use_gemini_post_edit=False,
    gemini_model="gemini-2.5-flash",
    min_clip_duration=3,
    max_clip_duration=30,
    target_language="English",
    gemini_api_key="AIza...",
    voice_label="Female: Sarah",
    prose_speed=1.0,
    mantra_speed=0.75,
    state=None,
    api_name="/run_pipeline",
)
```

#### JavaScript example

```js
import { Client } from "@gradio/client";

const client = await Client.connect("https://garchenarchive-archiveai.hf.space");

const result = await client.predict("/run_pipeline", {
  file_input: new Blob([audioBuffer], { type: "audio/mpeg" }),
  drive_url: "",
  do_stt: true,
  do_translation: true,
  do_tts: false,
  do_summary: false,
  language: "Both",
  selected_speakers: [],
  speaker_threshold: 0.5,
  use_gemini_post_edit: false,
  gemini_model: "gemini-2.5-flash",
  min_clip_duration: 3,
  max_clip_duration: 30,
  target_language: "English",
  gemini_api_key: "AIza...",
  voice_label: "Female: Sarah",
  prose_speed: 1.0,
  mantra_speed: 0.75,
  state: null,
});
```

---

### 2. Translate a single segment

Translates one piece of text and returns the translation. The simplest endpoint for one-off translation calls.

**API name:** `/translate_one_segment`

#### Inputs

| # | Parameter | Type | Description |
|---|-----------|------|-------------|
| 0 | `source` | `string` | Source text to translate |
| 1 | `target_lang` | `string` | Target language, e.g. `"English"` |
| 2 | `api_key` | `string` | Gemini API key |
| 3 | `gemini_model` | `string` | Gemini model name |

#### Output

`string` — the translated text, or `[Translation error: ...]` on failure.

#### Python example

```python
translation = client.predict(
    source="རང་གི་སེམས་ལ་བལྟ་བར་གྱིས།",
    target_lang="English",
    api_key="AIza...",
    gemini_model="gemini-2.5-flash",
    api_name="/translate_one_segment",
)
```

#### curl example

```bash
curl -X POST https://garchenarchive-archiveai.hf.space/run/predict \
  -H "Content-Type: application/json" \
  -d '{
    "fn_index": <fn_index>,
    "data": [
      "རང་གི་སེམས་ལ་བལྟ་བར་གྱིས།",
      "English",
      "AIza...",
      "gemini-2.5-flash"
    ]
  }'
```

> **Note:** For raw HTTP calls, look up the correct `fn_index` from `GET /info` — the index is assigned at startup and depends on event registration order.

---

### 3. Translate all segments

Translates all segments currently in the app state and refreshes the segment editor.

**API name:** `/translate_all_segments`

This is primarily used by the UI. For programmatic use, calling `/translate_one_segment` in a loop gives you finer error handling per segment.

---

### 4. Synthesize audio

Runs TTS on the segments currently in state.

**API name:** `/handle_synthesize`

#### Inputs

| # | Parameter | Type | Description |
|---|-----------|------|-------------|
| 0 | `state` | `object` | App state containing segments |
| 1 | `voice_label` | `string` | Voice name (see [Voices](#voices)) |
| 2 | `prose_speed` | `float` | 0.5–1.0 |
| 3 | `mantra_speed` | `float` | 0.5–1.0 |
| 4–28 | `slot_sources[0..24]` | `string` | Source text for each visible slot |
| 29–53 | `slot_targets[0..24]` | `string` | Target text for each visible slot |

#### Output

`[sample_rate: int, samples: float[]]` — NumPy-style audio array as returned by Gradio's `gr.Audio(type="numpy")`.

---

### 5. Pronunciation editor

Three endpoints for managing the TTS pronunciation glossary.

#### Look up a word

**API name:** `/lookup_word`

Input: `word: string`
Output: `pronunciation: string` (empty if not found)

#### Save a pronunciation

**API name:** `/save_pronunciation`

Inputs: `word: string`, `pronunciation: string`
Output: `status: string`

#### Remove a pronunciation

**API name:** `/remove_pronunciation`

Input: `word: string`
Output: `[pronunciation: string, status: string]`

---

## Reference

### Models

| Value | Notes |
|-------|-------|
| `gemini-2.5-flash` | Default — fast, good quality |
| `gemini-2.5-pro` | Higher quality, slower |
| `gemini-3-flash-preview` | Preview |
| `gemini-3.1-pro-preview` | Preview |

The app automatically falls back through `gemini-2.5-flash → gemini-2.5-pro` if the requested model fails.

### Voices

| Value | Voice |
|-------|-------|
| `Female: Sarah` | `af_sarah` |
| `Female: Heart` | `af_heart` |
| `Female: Alice` | `bf_alice` |
| `Female: Emma` | `bf_emma` |
| `Male: Adam` | `am_adam` |
| `Male: Onyx` | `am_onyx` |
| `Male: Daniel` | `bm_daniel` |
| `Male: George` | `bm_george` |

### Segment object schema

Segments are the core data structure passed through the pipeline:

```json
{
  "source": "Original transcription text",
  "target": "Translated text (empty string if not yet translated)",
  "timestamp": "00:00:01,000 --> 00:00:05,000"
}
```

`timestamp` follows SRT format and may be an empty string for plain-text inputs.

### Supported input file types

| Extension | Handled as |
|-----------|------------|
| `.mp3`, `.wav`, `.m4a`, `.mp4`, `.mov`, etc. | Audio/video — passed to STT |
| `.srt` | Subtitle file — parsed into segments |
| `.txt` | Plain text — each line becomes a segment |
| `.json` | Segment array — must match the segment object schema above |