web demo
TTS Performance
Underlined "TTS*" and "Judy*" are internal 🐸TTS models that are not released open-source. They are here to show the potential. Models prefixed with a dot (.Jofish .Abe and .Janice) are real human voices.
Underlined "TTS*" and "Judy*" are internal 🐸TTS models that are not released open-source. They are here to show the potential. Models prefixed with a dot (.Jofish .Abe and .Janice) are real human voices.
Tortoise TTS是一個文字轉語音的程序,它可以將文字轉換為逼真的語音。這個程式有多個聲音,能夠模擬不同說話者的音色和語調。所以,你可以根據需要選擇不同的聲音風格。 Tortoise TTS程式的原始程式碼包含了在推理模式下運行所需的所有程式碼。
Started to save /u/M4xM9450’s comment on the topic of open source TTS models.
Disclaimer: I’m far from an expert in this field, but I saw some desire to have a shared resource.
Please feel free to suggest or comment to clean this up or extend as you see fit.
All 11 of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub:
Size | Layers | Width | Heads | Parameters | English-only | Multilingual |
---|---|---|---|---|---|---|
tiny | 4 | 384 | 6 | 39 M | ✓ | ✓ |
base | 6 | 512 | 8 | 74 M | ✓ | ✓ |
small | 12 | 768 | 12 | 244 M | ✓ | ✓ |
medium | 24 | 1024 | 16 | 769 M | ✓ | ✓ |
large | 32 | 1280 | 20 | 1550 M | x | ✓ |
large-v2 | 32 | 1280 | 20 | 1550 M | x | ✓ |
large-v3 | 32 | 1280 | 20 | 1550 M | x | ✓ |
目前大規模基於純語音預訓練模型取得了很好的發展。 (wav2vec2, et al.)
但作者的認為語音識別系統的目標應該在通義環境下做到開箱即用,而不是需要針對於每個數據集,設置一個特定的解碼器,來進行帶監督的微調
“ The goal of a speech recognition system should be to work reliably “out of the box” in a broad range of environments without requiring supervised fine-tuning of a decoder for every deployment distribution “
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio (680,000 hours of multilingual and multitask supervised data) and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.