HTS is an open source [vocal synthesis] framework that forms the basis of several other higher-level open source speech synthesis tools including [sinsy] and [open-jtalk].
Website: http://hts.sp.nitech.ac.jp/
At its core, there are voice models ([htsvoice] files) which are generated by training on tagged audio. A speaker speaks a bunch of things, those things are tagged with words, n-grams, or some other stuff, and then a model is built using that training. The resulting model can then be loaded by HTS and used to generate waveforms based on arbitrary text input.
There is a C library that handles much of the heavy lifting involved with loading HTS voice models: https://github.com/Ameobea/hts_engine_API It is used + imported directly by [sinsy]. There's a closely related project called [festival] that seems to use at least parts of HTS internally.
HTS and the surrounding ecosystem is very much past its prime, judging by everything I've seen. It seems that it's been around for a very long time and that there's been a lot of research and grants and stuff going into it, but there's very little activity in the past 5-10 years at all.
As I do more research and learn more, this isn't very surprising. The whole thing is built on old, obsolete technology. Their entire suite of tools is designed for 2000s-era Unix environments; they're doing their work with hidden markov models while the world has moved on to massively powerful bleeding-edge artificial neural networks.
I personally think that investing time into this framework and its associated ecosystem is not worth it at this point in time. Its time has simply passed.