saos / README.md
hugofloresgarcia's picture
Add HF_TOKEN authentication support for model access
2760947

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Stable Audio Open Small - 4 Variations
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.20.0
app_file: app.py
pinned: false
license: other

Stable Audio Open Small - 4 Variations

Generate up to 4 audio variations from a single text prompt using Stability AI's Stable Audio Open Small model.

Model Information

Model: stabilityai/stable-audio-open-small

  • Type: Latent diffusion model (DiT) with autoencoder
  • Sample Rate: 44.1 kHz
  • Format: Stereo audio
  • Max Duration: 11 seconds
  • License: Stability AI Community License

Features

  • 4 Variations: Generate 4 different audio variations from a single prompt
  • Text-to-Audio: Simple text prompt interface
  • Variable Duration: Control audio length (1-11 seconds)
  • Fast Generation: Uses optimized pingpong sampler with 8 steps

Setup

This model requires accepting the license agreement on Hugging Face. To use this Space:

  1. Accept the model license: Visit stabilityai/stable-audio-open-small and accept the license agreement
  2. Create an access token: Go to Settings > Access Tokens and create a token with "read" permissions
  3. Add token to Space: In your Space settings, go to "Variables and secrets" and add a new secret:
    • Name: HF_TOKEN
    • Value: Your access token
    • Make sure it's marked as private

Usage

  1. Enter a text prompt describing the audio you want to generate
  2. Adjust the duration slider (1-11 seconds)
  3. Click "Generate" to create 4 variations
  4. Listen to and download your favorite variations

Example Prompts

  • "128 BPM tech house drum loop"
  • "Ocean waves crashing on beach"
  • "Jazz piano melody"
  • "Rainforest ambience with bird calls"
  • "Electronic synth pad"

Model Limitations

  • The model is not able to generate realistic vocals
  • Trained with English descriptions - may not perform as well in other languages
  • Better at generating sound effects and field recordings than music
  • Performance varies across different music styles and cultures
  • Prompt engineering may be required for best results

Technical Details

  • Steps: 8 (optimized for speed)
  • CFG Scale: 1.0
  • Sampler: pingpong
  • Batch Size: 4 (for generating variations)

License

This Space uses the Stability AI Community License. For commercial use, please refer to stability.ai/license.

Model Card

For more information about the model, training data, and limitations, see the model card.

Research Paper

Stable Audio Open: An Open Generative Audio Model