Spaces:

GarchenArchive
/

ArchiveAI

Sleeping

App Files Files Community

ArchiveAI / API.md

billingsmoore

add Drive folder text-file support; disable Gemini thinking; update API docs

71ce4a7 20 days ago

preview code

raw

history blame contribute delete

9.3 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

Full Pipeline — Gradio API Reference

Gradio automatically exposes every event handler as an HTTP endpoint. This document covers the endpoints useful for programmatic integration, with examples in Python, JavaScript, and curl.

Base URL

https://garchenarchive-archiveai.hf.space

When running locally the base URL is http://localhost:7860.

Auto-generated schema

Every running Gradio app publishes its full schema at:

GET {base_url}/info

You can also browse interactive API docs in the Gradio UI by clicking "Use via API" in the footer, or visiting {base_url}/docs.

Client libraries (recommended)

Prefer the official Gradio clients over raw HTTP — they handle file uploads, streaming, and session state automatically.

Python

pip install gradio_client

JavaScript / Node

npm install @gradio/client

Authentication

This app does not use HTTP-level authentication. The Gemini API key is passed as a plain parameter on each call that requires it (gemini_api_key). Keep it out of client-side code — proxy calls through your backend.

Endpoints

1. Run the full pipeline

Chains Speech-to-Text → Translation → Text-to-Speech in one call.

API name: /run_pipeline

Inputs

#	Parameter	Type	Description
0	`file_input`	`filepath \| null`	Audio, video, `.srt`, `.txt`, or `.json` file
1	`drive_url`	`string`	Google Drive share URL (alternative to file upload)
2	`do_stt`	`boolean`	Run Speech-to-Text stage
3	`do_translation`	`boolean`	Run Translation stage
4	`do_tts`	`boolean`	Run Text-to-Speech stage
5	`do_summary`	`boolean`	Generate a Gemini summary
6	`language`	`string`	STT language — `"English"`, `"Tibetan"`, `"Tibetan (Base)"`, or `"Both"`
7	`selected_speakers`	`string[]`	Speaker names to keep (empty = all speakers)
8	`speaker_threshold`	`float`	Speaker similarity threshold, 0.0–1.0 (default `0.5`)
9	`use_gemini_post_edit`	`boolean`	Correct transcription via Gemini — works on STT output and uploaded transcripts
10	`gemini_model`	`string`	Gemini model name (see Models)
11	`min_clip_duration`	`float`	Minimum segment duration in seconds (default `3`); shorter segments are not split further
12	`max_clip_duration`	`float`	Maximum segment duration in seconds (default `30`); longer segments are split into chunks
13	`target_language`	`string`	Translation target language, e.g. `"English"`, `"French"`
14	`gemini_api_key`	`string`	Gemini API key
15	`voice_label`	`string`	TTS voice (see Voices)
16	`prose_speed`	`float`	Prose playback speed, 0.5–1.0 (default `1.0`)
17	`mantra_speed`	`float`	Mantra playback speed, 0.5–1.0 (default `0.75`)
18	`state`	`object`	Pass `null` to start fresh

Outputs

The pipeline is a streaming generator — it yields intermediate status updates before the final result. The last yielded value contains:

Field	Type	Description
`state`	`object`	Updated app state including all segments
`status`	`string`	Human-readable status message
`srt_download`	`filepath \| null`	Path to generated SRT file
`json_download`	`filepath \| null`	Path to generated JSON file
`summary`	`string`	Summary text (if requested)
`audio`	`[sample_rate, array] \| null`	Synthesized audio (if TTS enabled)

Python example

from gradio_client import Client, handle_file

client = Client("https://garchenarchive-archiveai.hf.space")

result = client.predict(
    file_input=handle_file("/path/to/audio.mp3"),
    drive_url="",
    do_stt=True,
    do_translation=True,
    do_tts=False,
    do_summary=False,
    language="Both",
    selected_speakers=[],
    speaker_threshold=0.5,
    use_gemini_post_edit=False,
    gemini_model="gemini-2.5-flash",
    min_clip_duration=3,
    max_clip_duration=30,
    target_language="English",
    gemini_api_key="AIza...",
    voice_label="Female: Sarah",
    prose_speed=1.0,
    mantra_speed=0.75,
    state=None,
    api_name="/run_pipeline",
)

JavaScript example

import { Client } from "@gradio/client";

const client = await Client.connect("https://garchenarchive-archiveai.hf.space");

const result = await client.predict("/run_pipeline", {
  file_input: new Blob([audioBuffer], { type: "audio/mpeg" }),
  drive_url: "",
  do_stt: true,
  do_translation: true,
  do_tts: false,
  do_summary: false,
  language: "Both",
  selected_speakers: [],
  speaker_threshold: 0.5,
  use_gemini_post_edit: false,
  gemini_model: "gemini-2.5-flash",
  min_clip_duration: 3,
  max_clip_duration: 30,
  target_language: "English",
  gemini_api_key: "AIza...",
  voice_label: "Female: Sarah",
  prose_speed: 1.0,
  mantra_speed: 0.75,
  state: null,
});

2. Translate a single segment

Translates one piece of text and returns the translation. The simplest endpoint for one-off translation calls.

API name: /translate_one_segment

Inputs

#	Parameter	Type	Description
0	`source`	`string`	Source text to translate
1	`target_lang`	`string`	Target language, e.g. `"English"`
2	`api_key`	`string`	Gemini API key
3	`gemini_model`	`string`	Gemini model name

Output

string — the translated text, or [Translation error: ...] on failure.

Python example

translation = client.predict(
    source="རང་གི་སེམས་ལ་བལྟ་བར་གྱིས།",
    target_lang="English",
    api_key="AIza...",
    gemini_model="gemini-2.5-flash",
    api_name="/translate_one_segment",
)

curl example

curl -X POST https://garchenarchive-archiveai.hf.space/run/predict \
  -H "Content-Type: application/json" \
  -d '{
    "fn_index": <fn_index>,
    "data": [
      "རང་གི་སེམས་ལ་བལྟ་བར་གྱིས།",
      "English",
      "AIza...",
      "gemini-2.5-flash"
    ]
  }'

Note: For raw HTTP calls, look up the correct fn_index from GET /info — the index is assigned at startup and depends on event registration order.

3. Translate all segments

Translates all segments currently in the app state and refreshes the segment editor.

API name: /translate_all_segments

This is primarily used by the UI. For programmatic use, calling /translate_one_segment in a loop gives you finer error handling per segment.

4. Synthesize audio

Runs TTS on the segments currently in state.

API name: /handle_synthesize

Inputs

#	Parameter	Type	Description
0	`state`	`object`	App state containing segments
1	`voice_label`	`string`	Voice name (see Voices)
2	`prose_speed`	`float`	0.5–1.0
3	`mantra_speed`	`float`	0.5–1.0
4–28	`slot_sources[0..24]`	`string`	Source text for each visible slot
29–53	`slot_targets[0..24]`	`string`	Target text for each visible slot

Output

[sample_rate: int, samples: float[]] — NumPy-style audio array as returned by Gradio's gr.Audio(type="numpy").

5. Pronunciation editor

Three endpoints for managing the TTS pronunciation glossary.

Look up a word

API name: /lookup_word

Input: word: string Output: pronunciation: string (empty if not found)

Save a pronunciation

API name: /save_pronunciation

Inputs: word: string, pronunciation: string Output: status: string

Remove a pronunciation

API name: /remove_pronunciation

Input: word: string Output: [pronunciation: string, status: string]

Reference

Models

Value	Notes
`gemini-2.5-flash`	Default — fast, good quality
`gemini-2.5-pro`	Higher quality, slower
`gemini-3-flash-preview`	Preview
`gemini-3.1-pro-preview`	Preview

The app automatically falls back through gemini-2.5-flash → gemini-2.5-pro if the requested model fails.

Voices

Value	Voice
`Female: Sarah`	`af_sarah`
`Female: Heart`	`af_heart`
`Female: Alice`	`bf_alice`
`Female: Emma`	`bf_emma`
`Male: Adam`	`am_adam`
`Male: Onyx`	`am_onyx`
`Male: Daniel`	`bm_daniel`
`Male: George`	`bm_george`

Segment object schema

Segments are the core data structure passed through the pipeline:

{
  "source": "Original transcription text",
  "target": "Translated text (empty string if not yet translated)",
  "timestamp": "00:00:01,000 --> 00:00:05,000"
}

timestamp follows SRT format and may be an empty string for plain-text inputs.

Supported input file types

Extension	Handled as
`.mp3`, `.wav`, `.m4a`, `.mp4`, `.mov`, etc.	Audio/video — passed to STT
`.srt`	Subtitle file — parsed into segments
`.txt`	Plain text — each line becomes a segment
`.json`	Segment array — must match the segment object schema above