tencent
/

Universal_Audio_Tokenizer

audio-tokenizer

speech-tokenizer

Model card Files Files and versions

Add pipeline tag

#1

by nielsr HF Staff - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +8 -6

README.md CHANGED Viewed

@@ -1,9 +1,9 @@
 ---
-license: other
-license_name: license-term-of-universal-audio-tokenizer
 language:
 - en
 - zh
 tags:
 - audio
 - audio-tokenizer
@@ -11,11 +11,12 @@ tags:
 - speech
 - sound
 - music
 ---
 # Universal Audio Tokenizer: Empowering Semantic Speech Tokenizers with General Audio Perception
-**Universal Audio Tokenizer** is a compact single-codebook audio tokenizer that unifies general audio perception and
-linguistic alignment for downstream Audio-LLMs.
 📄 [Paper](https://arxiv.org/abs/2605.31521) | 💻 [GitHub](https://github.com/Tencent/Universal_Audio_Tokenizer)
@@ -108,6 +109,7 @@ Also, you can directly run the inference code snippet below:
 ```python
 import os
 import torch
 from transformers import WhisperFeatureExtractor
 from src.model.modeling_whisper import WhisperVQEncoder
 from src.model.flow_inference import AudioDecoder
@@ -167,7 +169,7 @@ Our Universal Audio Tokenizer achieves high-quality speech reconstruction with a
 ### Superior Downstream Audio-LLM Performance
-When integrated with the Qwen2.5 LLM backbone, our Universal Audio Tokenizer yields superior performance on a wide range of downstream audio understanding benchmarks and controllable TTS synthesis tasks, demonstrating its effectiveness as a unified audio input/output interface for Audio-LLMs.
 #### Audio Understanding
@@ -208,4 +210,4 @@ If you find our code or model useful for your research, please cite:
 ## License
-This project is licensed under the [License Term of Universal_Audio_Tokenizer](LICENSE).

 ---
 language:
 - en
 - zh
+license: other
+license_name: license-term-of-universal-audio-tokenizer
 tags:
 - audio
 - audio-tokenizer
 - speech
 - sound
 - music
+pipeline_tag: audio-to-audio
 ---
 # Universal Audio Tokenizer: Empowering Semantic Speech Tokenizers with General Audio Perception
+**Universal Audio Tokenizer** (UniAudio-Token) is a compact single-codebook audio tokenizer that unifies general audio perception and linguistic alignment for downstream Audio-LLMs.
 📄 [Paper](https://arxiv.org/abs/2605.31521) | 💻 [GitHub](https://github.com/Tencent/Universal_Audio_Tokenizer)
 ```python
 import os
 import torch
+from huggingface_hub import snapshot_download
 from transformers import WhisperFeatureExtractor
 from src.model.modeling_whisper import WhisperVQEncoder
 from src.model.flow_inference import AudioDecoder
 ### Superior Downstream Audio-LLM Performance
+When integrated with the Qwen2.5 LLM backbone, our Universal Audio Tokenizer yields superior performance on a wide range of downstream audio understanding benchmarks and controllable TTS synthesis tasks.
 #### Audio Understanding
 ## License
+This project is licensed under the [License Term of Universal_Audio_Tokenizer](LICENSE).