ACE-Step Transcriber

Description

ACE-Step Transcriber is the annotation model used by ACE-Step v1.5 for training data labeling. It is a powerful multilingual audio transcription model capable of transcribing both speech and singing voice with high accuracy.

Key Features

🌍 50+ Languages Support - Covers major world languages and regional dialects
🎤 Speech Transcription - Accurately transcribes spoken content
🎵 Singing Voice Transcription - Specialized in lyrics transcription with musical structure annotations
🏷️ Structure Annotation - Automatically identifies song sections (verse, chorus, bridge, etc.)

Usage

The usage is the same as Qwen2.5 Omni-7B.

Prompt Format

Use the following prompt to transcribe audio:

*Task* Transcribe this audio in detail
<audio>

Output Format

The model outputs structured content in the following format:

# Languages
<language_code>

# Lyrics
[Section Tag - Optional Instrument]

<transcribed content>
...

Example Output

# Languages
en

# Lyrics
[Intro - Acoustic Guitar]

[Verse 1]
Walking down the empty street tonight
Stars are shining oh so bright
...

[Chorus]
This is where we belong
Singing our favorite song
...

Supported Section Tags

[Intro], [Outro]
[Verse 1], [Verse 2], etc.
[Chorus], [Pre-Chorus], [Post-Chorus]
[Bridge]
[Guitar Interlude], [Instrumental]
[Spoken]

Supported Languages (50+)

The model supports transcription in over 50 languages, including but not limited to:

Region	Languages
East Asia	Chinese (zh), Japanese (ja), Korean (ko)
Southeast Asia	Vietnamese (vi), Thai (th), Indonesian (id), Malay (ms), Filipino (tl)
South Asia	Hindi (hi), Bengali (bn), Tamil (ta), Urdu (ur)
Europe	English (en), German (de), French (fr), Spanish (es), Italian (it), Portuguese (pt), Russian (ru), Polish (pl), Dutch (nl), Greek (el), Turkish (tr)
Middle East	Arabic (ar), Hebrew (he), Persian (fa)
Others	And many more regional languages...

Use Cases

Music Production - Transcribe reference tracks for lyrics extraction
Dataset Creation - Generate high-quality labeled data for music AI models
Accessibility - Create subtitles and captions for audio content
Music Analysis - Extract structural information from songs

Downloads last month: 4,364

Inference Providers NEW

Audio-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for ACE-Step/acestep-transcriber

ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation

Paper • 2602.00744 • Published Jan 31 • 11