larynx is an open source, fully-functional end-to-end [tts] solution. It builds on several other [vocal-synthesis] and [tts] libraries and toolkits to make speech synthesis extremely easy via an API as well as a simple web interface. Getting it running on my computer required a single command, and it launched a docker container which started up the API and made over 50 pre-trained voices available. They sound quite good, and synthesis was extremely fast.
Github Repository: https://github.com/rhasspy/larynx
If I were looking for a practical, high-quality, easy-to-use, functional TTS engine, this is the one I'd go for.
Their readme says that it's built using [[grutt]], a "tokenizer, text cleaner, and IPA phonemizer for several human lanauges" as well as [[onnx]] which stands for "Open Neural Network Exchange" and seems to be some kind of serialization format for [[neural network]]/deep learning models. It uses [glowtts] to perform the actual synthesis. HiFi-GAN ([GAN]) is used as a [vocoder].
Architecture diagram from their readme:
"Voices are trained on [phoneme] ids and [mel-spectrogram]"
It seems to support voices from [glowtts]; the web interface allows for downloading different voices and apparently [vocoder]s as well (maybe those are the same thing?).