This project is maintained by Shreeram Chandra
Authors : Shreeram Suresh Chandra, Zongyang Du, Berrak Sisman
Speech and Machine learning lab - The University of Texas at Dallas
Submitted to Speaker Odyssey 2024. Codes will released after acceptance.
FastSpeech2 [1] | VITS [2] | TEMOTTS | |
---|---|---|---|
1. | |||
2. | |||
3. | |||
4. | |||
5. | |||
6. | |||
7. |
Text | FastSpeech2 [1] | VITS [2] | TEMOTTS | |
---|---|---|---|---|
1. | Blowing out birthday candles makes me feel special! | |||
2. | Her heart felt heavy with sorrow. | |||
3. | I am feeling sad. | |||
4. | I feel joy when I see colourful balloons. | |||
5. | I feel like a broken toy discarded and forgotten. | |||
6. | I'm about to explode with anger! | |||
7. | I'm so angry I can't even breathe. | |||
8. | I'm so angry I could spit fire. | |||
9. | Playing with toys brings me so much happiness! | |||
10. | She felt like a part of her was missing. | |||
11. | Singing and dancing make me feel so good. | |||
12. | Smiling at others fills me with happiness. | |||
13. | Tears welled up in her eyes. | |||
14. | This is driving me crazy. | |||
15. | Watching a funny movie makes me laugh out loud. |
[1] Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu, “Fastspeech 2: Fast and high-quality end-to-end text to speech,” in International Conference on Learning Representations, 2021.
[2] Jaehyeon Kim, Jungil Kong, and Juhee Son, “Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech,” in International Conference on Machine Learning. PMLR, 2021, pp. 5530–5540.