Open Source Text-to-Speech Models (TTS)

AyaAbout 3 min

Open Source Text-to-Speech Models (TTS)

Started to save /u/M4xM9450’s comment on the topic of open source TTS models.
Disclaimer: I’m far from an expert in this field, but I saw some desire to have a shared resource.
Please feel free to suggest or comment to clean this up or extend as you see fit.

Neural TTS Models

Tacotron

Submitted: Mar 29, 2017
Paper: Tacotron: Towards End-to-End Speech Synthesis
Github: keithito/tacotron (Not the official implementation but is the one cited the most)

Tacotron2

Submitted: Dec 16, 2017
Paper: Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions
Github: NVIDIA/tacotron2

Transformer TTS

Submitted: Sept 19, 2018
Paper: Neural Speech Synthesis with Transformer Network
Github: N/A

Flowtron

Submitted: May 12, 2020
Paper: Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Github: NVIDIA/flowtron

FastSpeech2

Submitted: Jun 8, 2020
Paper: FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Github: ming024/FastSpeech2 (Not the official implementation but is the one cited the most)

FastPitch

Submitted: Jun 11, 2020
Paper: FastPitch: Parallel Text-to-speech with Pitch Prediction
Github: NVIDIA/DeepLearningExamples

TalkNet (1/2)

Submitted: May 12, 2020 / Apr 16, 2021
Paper: TalkNet: Efficient and Scalable Neural Voice Cloning / TalkNet 2: End-to-End Speaker Adaptation for High Fidelity and Sample Efficient Text-to-Speech
Github: NVIDIA/NeMo

GlowTTS

Submitted: May 22, 2020
Paper: Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Github: jaywalnut310/glow-tts

GradTTS (Diffusion TTS)

Submitted: May 13, 2021
Paper: Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Github: huawei-noah/Speech-Backbones

RadTTS

Submitted: Aug 18, 2021
Paper: RadTTS: Parallel Flow-based TTS with Robust Alignment Learning and Diverse Synthesis
Github: NVIDIA/radtts

Neural-HMMs

Submitted: Aug 30, 2021
Paper: Neural HMMs are All You Need (for High-Quality Attention-Free TTS)
Github: shivammehta25/Neural-HMM

OverFlow

Submitted: Nov 13, 2022
Paper: OverFlow: A Semi-Autoregressive Approach for Text-to-Speech with Conditional Normalizing Flows
Github: shivammehta25/OverFlow

Matcha-TTS

Submitted: Sep 6, 2023
Paper: Matcha-TTS: A fast TTS architecture with conditional flow matching
Github: shivammehta25/Matcha-TTS

Vocoders (Mel-spec to Audio)

WaveNet

Submitted: Sept 12, 2016
Paper: WaveNet: A Generative Model for Raw Audio
Github: N/A

WaveGlow

Submitted: Oct 31, 2018
Paper: WaveGlow: A Flow-based Generative Network for Speech Synthesis
Github: NVIDIA/waveglow

HiFiGAN

Submitted: Oct 12, 2020
Paper: HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Github: jik876/hifi-gan

MixerTTS

Submitted: Oct 7, 2021
Paper: Mixer-TTS: An FFT-Free Token-Mixing Architecture for Text-to-Speech
Github: NVIDIA/NeMo

VITS

Submitted: Jun 11, 2021
Paper: VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Github: jaywalnut310/vits

STYLER

Submitted: Mar 17, 2021
Paper: STYLER: Style-Driven Expressive Speech Synthesis with Parallel WaveGAN
Github: keonlee9420/STYLER

TorToiseTTS

Submitted: N/A
Paper: N/A
Github: neonbjb/tortoise-tts

DiffTTS (DiffSinger)

Submitted: Apr 3, 2021
Paper: DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Github: keonlee9420/DiffSinger

Uncategorised / Unevaluated

eSpeak

Submitted: N/A
Paper: eSpeak: Text to Speech Synthesizer
Github: espeak-ng/espeak-ng

CMU Flite TTS

Submitted: N/A
Paper: Flite: A Small Fast Run-Time Synthesis Engine
Github: festvox/flite

MaryTTS

Submitted: N/A
Paper: MaryTTS: An Open-Source Text-to-Speech Synthesis System
Github: marytts/marytts

Mimic 3

Note: Don’t get mixed with MIMIC-III, a medical database.
Submitted: N/A
Paper: N/A
Github: MycroftAI/mimic3

MBROLA

Submitted: N/A
Paper: MBROLA: A Speech Synthesis System Based on a Harmonic-Stochastic Model
Github: numediart/MBROLA

SpeechT5

Submitted: Oct 14, 2021
Paper: SpeechT5: Unified-Modal Speech Pre-Training for Spoken Language Processing
Github: microsoft/SpeechT5

SpeechBrain

Submitted: N/A
Paper: N/A
Github: speechbrain/speechbrain
Website: SpeechBrain

Coqui-ai TTS

Submitted: N/A
Paper: reference paper
Github: coqui-ai/TTS
Website: coqui.ai

Credits

/u/M4xM9450’s comment on the topic of open source TTS models.

Reference