Spaces:

hugggof
/

saos

Running

App Files Files Community

saos / README.md

hugofloresgarcia

Add HF_TOKEN authentication support for model access

2760947 23 days ago

preview code

raw

history blame contribute delete

2.85 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Stable Audio Open Small - 4 Variations
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.20.0
app_file: app.py
pinned: false
license: other

Stable Audio Open Small - 4 Variations

Generate up to 4 audio variations from a single text prompt using Stability AI's Stable Audio Open Small model.

Model Information

Model: stabilityai/stable-audio-open-small

Type: Latent diffusion model (DiT) with autoencoder
Sample Rate: 44.1 kHz
Format: Stereo audio
Max Duration: 11 seconds
License: Stability AI Community License

Features

4 Variations: Generate 4 different audio variations from a single prompt
Text-to-Audio: Simple text prompt interface
Variable Duration: Control audio length (1-11 seconds)
Fast Generation: Uses optimized pingpong sampler with 8 steps

Setup

This model requires accepting the license agreement on Hugging Face. To use this Space:

Accept the model license: Visit stabilityai/stable-audio-open-small and accept the license agreement
Create an access token: Go to Settings > Access Tokens and create a token with "read" permissions
Add token to Space: In your Space settings, go to "Variables and secrets" and add a new secret:
- Name: HF_TOKEN
- Value: Your access token
- Make sure it's marked as private

Usage

Enter a text prompt describing the audio you want to generate
Adjust the duration slider (1-11 seconds)
Click "Generate" to create 4 variations
Listen to and download your favorite variations

Example Prompts

"128 BPM tech house drum loop"
"Ocean waves crashing on beach"
"Jazz piano melody"
"Rainforest ambience with bird calls"
"Electronic synth pad"

Model Limitations

The model is not able to generate realistic vocals
Trained with English descriptions - may not perform as well in other languages
Better at generating sound effects and field recordings than music
Performance varies across different music styles and cultures
Prompt engineering may be required for best results

Technical Details

Steps: 8 (optimized for speed)
CFG Scale: 1.0
Sampler: pingpong
Batch Size: 4 (for generating variations)

License

This Space uses the Stability AI Community License. For commercial use, please refer to stability.ai/license.

Model Card

For more information about the model, training data, and limitations, see the model card.

Research Paper

Stable Audio Open: An Open Generative Audio Model