Deep Voice 2 (Summary by: Ricardo Reimao)

Original paper: https://arxiv.org/pdf/1705.08947.pdf

Few months after the publication of the Deep Voice paper, researchers from the same company published the Deep Voice 2, which is an expansion of the first proposed methodology. In the second paper, the authors propose few improvements to the original publication: Multi-speaker support, segmentation of modules and increase in training data. Although the overall architectures of both methodologies are very similar, the authors tune the system to achieve better performance and provide the above mentioned new features. Continue reading “Deep Voice 2 (Summary by: Ricardo Reimao)”

Deep Voice by Baidu Labs (Summary by: Ricardo Reimao)

Original Paper: https://arxiv.org/abs/1702.07825

Researchers from the Baidu Labs published in early 2017 a paper about their project called DeepVoice. This project aims to create a production-level end-to-end solution for Text-To-Speech (TTS). The key point of the DeepVoice is that it doesn’t require any specialist knowledge during the training/inference process, and that the solution is able to generate audio in real-time.

DeepVoice breaks down the TTS problem into five models: Continue reading “Deep Voice by Baidu Labs (Summary by: Ricardo Reimao)”