ComfyUI Custom Node

This repository includes a custom node for ComfyUI integration:

🔗 ComfyUI-SoulX-Singer

Use this custom node to integrate SoulX-Singer into your ComfyUI workflows for seamless singing voice synthesis.

SoulX-Singer: Converted .pt model to .safetensors

bf16 + fp32

Audio Samples

Original Audio

SpongeBob Voice

Male Voice

Towards High-Quality Zero-Shot Singing Voice Synthesis

Overview

SoulX-Singer is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.

For more details, please refer to the paper: SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis.

Features

Zero-shot synthesis: Generate singing voices for unseen singers without fine-tuning
Melody-conditioned control: Use F0 contour for pitch guidance
Score-conditioned control: Use MIDI notes for precise musical notation
High-fidelity output: Realistic vocal synthesis with natural expression
Safetensors format: Optimized model weights in bf16 + fp32 precision

Citation

If you use SoulX-Singer in your research, please cite:

@article{soulxsinger2025,
  title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
  author={Soul-AILab},
  journal={arXiv preprint arXiv:2602.07803},
  year={2025}
}

License

This project is licensed under the Apache License 2.0.

Downloads last month: 85

Paper for drbaph/SoulX-Singer

SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis

Paper • 2602.07803 • Published Feb 8 • 5