Add pipeline tag
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,9 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
license: other
|
| 3 |
-
license_name: license-term-of-universal-audio-tokenizer
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
- zh
|
|
|
|
|
|
|
| 7 |
tags:
|
| 8 |
- audio
|
| 9 |
- audio-tokenizer
|
|
@@ -11,11 +11,12 @@ tags:
|
|
| 11 |
- speech
|
| 12 |
- sound
|
| 13 |
- music
|
|
|
|
| 14 |
---
|
|
|
|
| 15 |
# Universal Audio Tokenizer: Empowering Semantic Speech Tokenizers with General Audio Perception
|
| 16 |
|
| 17 |
-
**Universal Audio Tokenizer** is a compact single-codebook audio tokenizer that unifies general audio perception and
|
| 18 |
-
linguistic alignment for downstream Audio-LLMs.
|
| 19 |
|
| 20 |
๐ [Paper](https://arxiv.org/abs/2605.31521) | ๐ป [GitHub](https://github.com/Tencent/Universal_Audio_Tokenizer)
|
| 21 |
|
|
@@ -108,6 +109,7 @@ Also, you can directly run the inference code snippet below:
|
|
| 108 |
```python
|
| 109 |
import os
|
| 110 |
import torch
|
|
|
|
| 111 |
from transformers import WhisperFeatureExtractor
|
| 112 |
from src.model.modeling_whisper import WhisperVQEncoder
|
| 113 |
from src.model.flow_inference import AudioDecoder
|
|
@@ -167,7 +169,7 @@ Our Universal Audio Tokenizer achieves high-quality speech reconstruction with a
|
|
| 167 |
|
| 168 |
### Superior Downstream Audio-LLM Performance
|
| 169 |
|
| 170 |
-
When integrated with the Qwen2.5 LLM backbone, our Universal Audio Tokenizer yields superior performance on a wide range of downstream audio understanding benchmarks and controllable TTS synthesis tasks
|
| 171 |
|
| 172 |
#### Audio Understanding
|
| 173 |
|
|
@@ -208,4 +210,4 @@ If you find our code or model useful for your research, please cite:
|
|
| 208 |
|
| 209 |
## License
|
| 210 |
|
| 211 |
-
This project is licensed under the [License Term of Universal_Audio_Tokenizer](LICENSE).
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
- zh
|
| 5 |
+
license: other
|
| 6 |
+
license_name: license-term-of-universal-audio-tokenizer
|
| 7 |
tags:
|
| 8 |
- audio
|
| 9 |
- audio-tokenizer
|
|
|
|
| 11 |
- speech
|
| 12 |
- sound
|
| 13 |
- music
|
| 14 |
+
pipeline_tag: audio-to-audio
|
| 15 |
---
|
| 16 |
+
|
| 17 |
# Universal Audio Tokenizer: Empowering Semantic Speech Tokenizers with General Audio Perception
|
| 18 |
|
| 19 |
+
**Universal Audio Tokenizer** (UniAudio-Token) is a compact single-codebook audio tokenizer that unifies general audio perception and linguistic alignment for downstream Audio-LLMs.
|
|
|
|
| 20 |
|
| 21 |
๐ [Paper](https://arxiv.org/abs/2605.31521) | ๐ป [GitHub](https://github.com/Tencent/Universal_Audio_Tokenizer)
|
| 22 |
|
|
|
|
| 109 |
```python
|
| 110 |
import os
|
| 111 |
import torch
|
| 112 |
+
from huggingface_hub import snapshot_download
|
| 113 |
from transformers import WhisperFeatureExtractor
|
| 114 |
from src.model.modeling_whisper import WhisperVQEncoder
|
| 115 |
from src.model.flow_inference import AudioDecoder
|
|
|
|
| 169 |
|
| 170 |
### Superior Downstream Audio-LLM Performance
|
| 171 |
|
| 172 |
+
When integrated with the Qwen2.5 LLM backbone, our Universal Audio Tokenizer yields superior performance on a wide range of downstream audio understanding benchmarks and controllable TTS synthesis tasks.
|
| 173 |
|
| 174 |
#### Audio Understanding
|
| 175 |
|
|
|
|
| 210 |
|
| 211 |
## License
|
| 212 |
|
| 213 |
+
This project is licensed under the [License Term of Universal_Audio_Tokenizer](LICENSE).
|