Current location - Education and Training Encyclopedia - Graduation thesis - Introduction to Speech Synthesis Text to Speech
Introduction to Speech Synthesis Text to Speech
The main content of this blog is to introduce the background knowledge of text-to-speech I hope readers can easily understand the working principle of speech synthesis and lay a foundation for understanding the most advanced text-to-speech algorithm.

This introduction is mainly based on the appendix of this paper, "Wavenet: Generation Model of Raw Audio". The link of the paper is as follows: blogs.com/BaroC/p/4283380.html.

For the algorithm of neural network, generally, 256 quantized values are generated based on softmax classifier, corresponding to 256 quantized values of sound. WaveRNN and wavenet are generated in this way.

The following are some materials for my study of speech synthesis, among which Stanford cs224s is highly recommended, but the logic of this handout is not very clear, so I will understand it after reading it repeatedly.

Ucsb digital speech processing course, the basis of sound signal processing. I suggest you have a look. The link is as follows. /view/68 fbf 1a4f 6 1fb 7360 b4c 658 b . html