Instructions to use bezzam/VibeVoice-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bezzam/VibeVoice-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="bezzam/VibeVoice-7B")# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("bezzam/VibeVoice-7B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Is this broken or out of date? Lot's of errors trying to get this to work.
I am seeing errors related to kernel size pattern in the bezzam encoder—those strided convolutions with kernels [4, 4, 8, 10, 10, 16] and corresponding downsampling ratios. Among other things actually.
Not sure what could be affecting it, other than perhaps some overlap of your references to the Microsoft official one, (also having this model removed)
@Fieldsweeper this is a draft checkpoint to get a version working with Transformers. so it's normal that it isn't working as the code to use it (here) is still in progress / under review. I can let you know when it's ready for testing if you want to try out before it gets merged into Transformers?
Should I be using the latest version of the code in that PR? It looks like progress has stalled.
hi @SerialVelocity , thanks for your message. yes that PR is still waiting for a review... we've had lots of releases in between but it is still in the plan to merge it into Transformers.
This draft checkpoint and the 1.5B draft should normally work with the latest code from the PR. Let me know if it doesn't!