Open Source Text-to-Speech Models (TTS)
About 3 min
Open Source Text-to-Speech Models (TTS)
Started to save /u/M4xM9450’s comment on the topic of open source TTS models.
Disclaimer: I’m far from an expert in this field, but I saw some desire to have a shared resource.
Please feel free to suggest or comment to clean this up or extend as you see fit.
Neural TTS Models
Tacotron
- Submitted: Mar 29, 2017
- Paper: Tacotron: Towards End-to-End Speech Synthesis
- Github: keithito/tacotron (Not the official implementation but is the one cited the most)
Tacotron2
- Submitted: Dec 16, 2017
- Paper: Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions
- Github: NVIDIA/tacotron2
Transformer TTS
- Submitted: Sept 19, 2018
- Paper: Neural Speech Synthesis with Transformer Network
- Github: N/A
Flowtron
- Submitted: May 12, 2020
- Paper: Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
- Github: NVIDIA/flowtron
FastSpeech2
- Submitted: Jun 8, 2020
- Paper: FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
- Github: ming024/FastSpeech2 (Not the official implementation but is the one cited the most)
FastPitch
- Submitted: Jun 11, 2020
- Paper: FastPitch: Parallel Text-to-speech with Pitch Prediction
- Github: NVIDIA/DeepLearningExamples
TalkNet (1/2)
- Submitted: May 12, 2020 / Apr 16, 2021
- Paper: TalkNet: Efficient and Scalable Neural Voice Cloning / TalkNet 2: End-to-End Speaker Adaptation for High Fidelity and Sample Efficient Text-to-Speech
- Github: NVIDIA/NeMo
GlowTTS
- Submitted: May 22, 2020
- Paper: Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
- Github: jaywalnut310/glow-tts
GradTTS (Diffusion TTS)
- Submitted: May 13, 2021
- Paper: Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
- Github: huawei-noah/Speech-Backbones
RadTTS
- Submitted: Aug 18, 2021
- Paper: RadTTS: Parallel Flow-based TTS with Robust Alignment Learning and Diverse Synthesis
- Github: NVIDIA/radtts
Neural-HMMs
- Submitted: Aug 30, 2021
- Paper: Neural HMMs are All You Need (for High-Quality Attention-Free TTS)
- Github: shivammehta25/Neural-HMM
OverFlow
- Submitted: Nov 13, 2022
- Paper: OverFlow: A Semi-Autoregressive Approach for Text-to-Speech with Conditional Normalizing Flows
- Github: shivammehta25/OverFlow
Matcha-TTS
- Submitted: Sep 6, 2023
- Paper: Matcha-TTS: A fast TTS architecture with conditional flow matching
- Github: shivammehta25/Matcha-TTS
Vocoders (Mel-spec to Audio)
WaveNet
- Submitted: Sept 12, 2016
- Paper: WaveNet: A Generative Model for Raw Audio
- Github: N/A
WaveGlow
- Submitted: Oct 31, 2018
- Paper: WaveGlow: A Flow-based Generative Network for Speech Synthesis
- Github: NVIDIA/waveglow
HiFiGAN
- Submitted: Oct 12, 2020
- Paper: HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
- Github: jik876/hifi-gan
MixerTTS
- Submitted: Oct 7, 2021
- Paper: Mixer-TTS: An FFT-Free Token-Mixing Architecture for Text-to-Speech
- Github: NVIDIA/NeMo
VITS
- Submitted: Jun 11, 2021
- Paper: VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
- Github: jaywalnut310/vits
STYLER
- Submitted: Mar 17, 2021
- Paper: STYLER: Style-Driven Expressive Speech Synthesis with Parallel WaveGAN
- Github: keonlee9420/STYLER
TorToiseTTS
- Submitted: N/A
- Paper: N/A
- Github: neonbjb/tortoise-tts
DiffTTS (DiffSinger)
- Submitted: Apr 3, 2021
- Paper: DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
- Github: keonlee9420/DiffSinger
Uncategorised / Unevaluated
eSpeak
- Submitted: N/A
- Paper: eSpeak: Text to Speech Synthesizer
- Github: espeak-ng/espeak-ng
CMU Flite TTS
- Submitted: N/A
- Paper: Flite: A Small Fast Run-Time Synthesis Engine
- Github: festvox/flite
MaryTTS
- Submitted: N/A
- Paper: MaryTTS: An Open-Source Text-to-Speech Synthesis System
- Github: marytts/marytts
Mimic 3
- Note: Don’t get mixed with MIMIC-III, a medical database.
- Submitted: N/A
- Paper: N/A
- Github: MycroftAI/mimic3
MBROLA
- Submitted: N/A
- Paper: MBROLA: A Speech Synthesis System Based on a Harmonic-Stochastic Model
- Github: numediart/MBROLA
SpeechT5
- Submitted: Oct 14, 2021
- Paper: SpeechT5: Unified-Modal Speech Pre-Training for Spoken Language Processing
- Github: microsoft/SpeechT5
SpeechBrain
- Submitted: N/A
- Paper: N/A
- Github: speechbrain/speechbrain
- Website: SpeechBrain
Coqui-ai TTS
- Submitted: N/A
- Paper: reference paper
- Github: coqui-ai/TTS
- Website: coqui.ai
Credits
/u/M4xM9450’s comment on the topic of open source TTS models.