ComfyUI Custom Node

This repository includes a custom node for ComfyUI integration:

🔗 ComfyUI-SoulX-Singer

Screenshot 2026-02-11 160905

Use this custom node to integrate SoulX-Singer into your ComfyUI workflows for seamless singing voice synthesis.

SoulX-Singer: Converted .pt model to .safetensors

bf16 + fp32

Audio Samples

Original Audio

SpongeBob Voice

Male Voice


Towards High-Quality Zero-Shot Singing Voice Synthesis

SoulX-Singer_Logo

version Github arXiv technical report Apache-2.0


Overview

SoulX-Singer is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.

For more details, please refer to the paper: SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis.


Features

  • Zero-shot synthesis: Generate singing voices for unseen singers without fine-tuning
  • Melody-conditioned control: Use F0 contour for pitch guidance
  • Score-conditioned control: Use MIDI notes for precise musical notation
  • High-fidelity output: Realistic vocal synthesis with natural expression
  • Safetensors format: Optimized model weights in bf16 + fp32 precision

Citation

If you use SoulX-Singer in your research, please cite:

@article{soulxsinger2025,
  title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
  author={Soul-AILab},
  journal={arXiv preprint arXiv:2602.07803},
  year={2025}
}

License

This project is licensed under the Apache License 2.0.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for drbaph/SoulX-Singer