ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation
Paper
•
2602.00744
•
Published
•
1
ACE-Step Transcriber is the annotation model used by ACE-Step v1.5 for training data labeling. It is a powerful multilingual audio transcription model capable of transcribing both speech and singing voice with high accuracy.
The usage is the same as Qwen2.5 Omni-7B.
Use the following prompt to transcribe audio:
*Task* Transcribe this audio in detail
<audio>
The model outputs structured content in the following format:
# Languages
<language_code>
# Lyrics
[Section Tag - Optional Instrument]
<transcribed content>
...
# Languages
en
# Lyrics
[Intro - Acoustic Guitar]
[Verse 1]
Walking down the empty street tonight
Stars are shining oh so bright
...
[Chorus]
This is where we belong
Singing our favorite song
...
[Intro], [Outro][Verse 1], [Verse 2], etc.[Chorus], [Pre-Chorus], [Post-Chorus][Bridge][Guitar Interlude], [Instrumental][Spoken]The model supports transcription in over 50 languages, including but not limited to:
| Region | Languages |
|---|---|
| East Asia | Chinese (zh), Japanese (ja), Korean (ko) |
| Southeast Asia | Vietnamese (vi), Thai (th), Indonesian (id), Malay (ms), Filipino (tl) |
| South Asia | Hindi (hi), Bengali (bn), Tamil (ta), Urdu (ur) |
| Europe | English (en), German (de), French (fr), Spanish (es), Italian (it), Portuguese (pt), Russian (ru), Polish (pl), Dutch (nl), Greek (el), Turkish (tr) |
| Middle East | Arabic (ar), Hebrew (he), Persian (fa) |
| Others | And many more regional languages... |