Detection of GAN-Generated Fake Images over Social Networks (Summary By: Ricardo Reimao)

Original paper: https://ieeexplore.ieee.org/document/8397040

The diffusion of fake images and videos on social networks is a fast growing problem. Commercial media editing tools allow anyone to remove, add, or clone people and objects, to generate fake images. Continue reading “Detection of GAN-Generated Fake Images over Social Networks (Summary By: Ricardo Reimao)”

Deep Voice 2 (Summary by: Ricardo Reimao)

Original paper: https://arxiv.org/pdf/1705.08947.pdf

Few months after the publication of the Deep Voice paper, researchers from the same company published the Deep Voice 2, which is an expansion of the first proposed methodology. In the second paper, the authors propose few improvements to the original publication: Multi-speaker support, segmentation of modules and increase in training data. Although the overall architectures of both methodologies are very similar, the authors tune the system to achieve better performance and provide the above mentioned new features. Continue reading “Deep Voice 2 (Summary by: Ricardo Reimao)”

Deep Voice by Baidu Labs (Summary by: Ricardo Reimao)

Original Paper: https://arxiv.org/abs/1702.07825

Researchers from the Baidu Labs published in early 2017 a paper about their project called DeepVoice. This project aims to create a production-level end-to-end solution for Text-To-Speech (TTS). The key point of the DeepVoice is that it doesn’t require any specialist knowledge during the training/inference process, and that the solution is able to generate audio in real-time.

DeepVoice breaks down the TTS problem into five models: Continue reading “Deep Voice by Baidu Labs (Summary by: Ricardo Reimao)”

Hinton’s Capsules, LAS, and PixelCNN

This week’s topic is about the idea of Hinton’s capsules, LAS (Listen, Attend and Spell) and PixelCNN. The reading for Hinton’s capsule that I’ve linked is a blog and you only need to understand the intuition behind it. If you want to read more about it, I’ve also linked the paper as well. LAS is an interesting paper that should introduce you to the idea of encoder-decoder architectures in speech recognition. The last paper for this week is PixelCNN. This paper is worth checking out because the intuition of it is also used in an architecture that is will be mentioned in the following week. Continue reading “Hinton’s Capsules, LAS, and PixelCNN”