| --- |
| license: apache-2.0 |
| base_model: |
| - coqui/XTTS-v2 |
| --- |
| # Auralis π |
|
|
| ## Model Details π οΈ |
|
|
| **Model Name:** Auralis |
|
|
| **Model Architecture:** Based on [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) |
|
|
| **License:** |
| - license: Apache 2.0 |
| - base_model: XTTS-v2 Components [Coqui AI License](https://coqui.ai/cpml) |
| |
| **Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi |
| |
| **Developed by:** [AstraMind.ai](https://www.astramind.ai) |
| |
| **GitHub:** [AstraMind AI](https://github.com/astramind-ai/Auralis/tree/main) |
| |
| **Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks. |
| |
| --- |
| |
| ## Model Description π |
| |
| Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling. |
| |
| ### Key Features: |
| - **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes. |
| - **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090. |
| - **Scalable:** Handles multiple requests simultaneously. |
| - **Streaming:** Seamlessly processes long texts in a streaming format. |
| - **Custom Voices:** Enables voice cloning from short reference audio. |
| |
| --- |
| |
| ## Quick Start β |
| |
| ```python |
| from auralis import TTS, TTSRequest |
| |
| # Initialize the model |
| tts = TTS().from_pretrained("AstraMindAI/xtts2-gpt") |
|
|
| # Create a TTS request |
| request = TTSRequest( |
| text="Hello Earth! This is Auralis speaking.", |
| speaker_files=["reference.wav"] |
| ) |
| |
| # Generate speech |
| output = tts.generate_speech(request) |
| output.save("output.wav") |
| ``` |
| |
| --- |
| |
| ## Ebook Generation π |
| |
| Auralis converting ebooks into audio formats at lightning speed. For Python script, check out [ebook_audio_generator.py](https://github.com/astramind-ai/Auralis/blob/main/examples/vocalize_a_ebook.py). |
| |
| ```python |
| def process_book(chapter_file: str, speaker_file: str): |
| # Read chapter |
| with open(chapter_file, 'r') as f: |
| chapter = f.read() |
| |
| # You can pass the whole book, auralis will take care of splitting |
| |
| request = TTSRequest( |
| text=chapter, |
| speaker_files=[speaker_file], |
| audio_config=AudioPreprocessingConfig( |
| enhance_speech=True, |
| normalize=True |
| ) |
| ) |
| |
| output = tts.generate_speech(request) |
| |
| output.play() |
| output.save("chapter_output.wav") |
| |
| # Example usage |
| process_book("chapter1.txt", "reference_voice.wav") |
| ``` |
| |
| --- |
| |
| ## Intended Use π |
| |
| Auralis is designed for: |
| - **Content Creators:** Generate audiobooks, podcasts, or voiceovers. |
| - **Developers:** Integrate TTS into applications via a simple Python API. |
| - **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties. |
| - **Multilingual Scenarios:** Convert text to speech in multiple supported languages. |
| |
| --- |
| |
| ## Performance π |
| |
| **Benchmarks on NVIDIA RTX 3090:** |
| - Short phrases (<100 characters): ~1 second |
| - Medium texts (<1,000 characters): ~5-10 seconds |
| - Full books (~100,000 characters): ~10 minutes |
| |
| **Memory Usage:** |
| - Base VRAM: ~4GB |
| - Peak VRAM: ~10GB |
| |
| --- |
| |
| ## Model Features πΈ |
| |
| 1. **Speed & Efficiency:** |
| - Smart batching for rapid processing of long texts. |
| - Memory-optimized for consumer GPUs. |
| |
| 2. **Easy Integration:** |
| - Python API with support for synchronous and asynchronous workflows. |
| - Streaming mode for continuous playback during generation. |
| |
| 3. **Audio Quality Enhancements:** |
| - Background noise reduction. |
| - Voice clarity and volume normalization. |
| - Customizable audio preprocessing. |
| |
| 4. **Multilingual Support:** |
| - Automatic language detection. |
| - High-quality speech in 15+ languages. |
| |
| 5. **Customization:** |
| - Voice cloning using short reference clips. |
| - Adjustable parameters for tone, pacing, and language. |
| |
| --- |
| |
| ## Limitations & Ethical Considerations β οΈ |
| |
| - **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent. |
| - **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input. |
| |
| --- |
| |
| ## Citation π |
| |
| If you use Auralis in your research or projects, please cite: |
| |
| ```bibtex |
| @misc{auralis2024, |
| author = {AstraMind AI}, |
| title = {Auralis: High-Performance Text-to-Speech Engine}, |
| year = {2024}, |
| url = {https://huggingface.co/AstraMindAI/auralis} |
| } |
| ``` |