Dion Tu – APTLY and LaSSoftE

The diffusion of fake images and videos on social networks is a fast growing problem. Commercial media editing tools allow anyone to remove, add, or clone people and objects, to generate fake images. Continue reading “Detection of GAN-Generated Fake Images over Social Networks (Summary By: Ricardo Reimao)”

March 6, 2019March 20, 2019

Deep Voice 3 (Summary by: Ricardo Reimao)

Original paper: https://arxiv.org/abs/1710.07654

After the success of DeepVoice 1 and DeepVoice 2, researchers from the same company published a paper regarding their success with DeepVoice 3. In this new paper, the researchers discarded the previous architecture used in DeepVoice1 and DeepVoice2 and introduced a completely novel neural Continue reading “Deep Voice 3 (Summary by: Ricardo Reimao)”

February 21, 2019February 21, 2019

Deep Voice 2 (Summary by: Ricardo Reimao)

Original paper: https://arxiv.org/pdf/1705.08947.pdf

Few months after the publication of the Deep Voice paper, researchers from the same company published the Deep Voice 2, which is an expansion of the first proposed methodology. In the second paper, the authors propose few improvements to the original publication: Multi-speaker support, segmentation of modules and increase in training data. Although the overall architectures of both methodologies are very similar, the authors tune the system to achieve better performance and provide the above mentioned new features. Continue reading “Deep Voice 2 (Summary by: Ricardo Reimao)”

February 16, 2019February 16, 2019

Deep Voice by Baidu Labs (Summary by: Ricardo Reimao)

Original Paper: https://arxiv.org/abs/1702.07825

Researchers from the Baidu Labs published in early 2017 a paper about their project called DeepVoice. This project aims to create a production-level end-to-end solution for Text-To-Speech (TTS). The key point of the DeepVoice is that it doesn’t require any specialist knowledge during the training/inference process, and that the solution is able to generate audio in real-time.

DeepVoice breaks down the TTS problem into five models: Continue reading “Deep Voice by Baidu Labs (Summary by: Ricardo Reimao)”

November 14, 2018January 25, 2019

WaveNet, Image style transfer with CNN, and Attention Mechanisms

This week, we talk about WaveNet, image style transfer with convolutional neural networks and attention mechanisms. To have a better understanding of the WaveNet, read the blog first and continue with the paper. Continue reading “WaveNet, Image style transfer with CNN, and Attention Mechanisms”

November 4, 2018January 25, 2019

Hinton’s Capsules, LAS, and PixelCNN

This week’s topic is about the idea of Hinton’s capsules, LAS (Listen, Attend and Spell) and PixelCNN. The reading for Hinton’s capsule that I’ve linked is a blog and you only need to understand the intuition behind it. If you want to read more about it, I’ve also linked the paper as well. LAS is an interesting paper that should introduce you to the idea of encoder-decoder architectures in speech recognition. The last paper for this week is PixelCNN. This paper is worth checking out because the intuition of it is also used in an architecture that is will be mentioned in the following week. Continue reading “Hinton’s Capsules, LAS, and PixelCNN”

October 29, 2018January 25, 2019

ResNets, HighwayNets, DenseNets and GANs

The idea of ResNets, HighwayNets, and DenseNets revolves around addressing the vanishing gradient problem. If you are not familiar with that term, or would like to refresh your definition, check out the first link below. Also check out the next topic which is General Adversarial Networks. Continue reading “ResNets, HighwayNets, DenseNets and GANs”

October 22, 2018January 25, 2019

More on CNNs and Intro to Recurrent Neural Networks

By now you’ve probably familiarized yourself with the convolutional neural network that won the ImageNet challenge in 2012. To see how it has evolved, check out the first paper that I’ve linked below, where researchers from google expanded the typical CNN architecture and created what they called “Inception”. Continue reading “More on CNNs and Intro to Recurrent Neural Networks”

October 12, 2018January 25, 2019

Convolutional Neural Networks and Deconvolutional Networks

Understand the idea of convolutional neural networks by reading the paper and parts 1 and 2 of the blog. Read about deconvolutional networks and its applications. Continue reading “Convolutional Neural Networks and Deconvolutional Networks”

October 12, 2018January 25, 2019

Learn more about APTLY

The APTLY lab continuously explores the latest artificial neural network architectures in audio processing, transcription and generation. Each week, we take a look at a couple of academic papers/blogs and discuss about its contribution and how it affects the field of audio. Continue reading “Learn more about APTLY”