ComfyUI Custom Node
This repository includes a custom node for ComfyUI integration:
🔗 ComfyUI-SoulX-Singer
Use this custom node to integrate SoulX-Singer into your ComfyUI workflows for seamless singing voice synthesis.
SoulX-Singer: Converted .pt model to .safetensors
bf16 + fp32
Audio Samples
Original Audio
SpongeBob Voice
Male Voice
Overview
SoulX-Singer is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.
For more details, please refer to the paper: SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis.
Features
- Zero-shot synthesis: Generate singing voices for unseen singers without fine-tuning
- Melody-conditioned control: Use F0 contour for pitch guidance
- Score-conditioned control: Use MIDI notes for precise musical notation
- High-fidelity output: Realistic vocal synthesis with natural expression
- Safetensors format: Optimized model weights in bf16 + fp32 precision
Citation
If you use SoulX-Singer in your research, please cite:
@article{soulxsinger2025,
title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
author={Soul-AILab},
journal={arXiv preprint arXiv:2602.07803},
year={2025}
}
License
This project is licensed under the Apache License 2.0.
- Downloads last month
- 6
Paper for drbaph/SoulX-Singer
Paper
•
2602.07803
•
Published
•
4
