【2018學術報告09】Efficient recurrent and feedforward neural network architectures for acoustic modelling
Title: Efficient recurrent and feedforward neural network architectures for acoustic modelling
Speaker: Chao Zhang, Associate Professor, University of Cambridge
Time: 10:00 a.m., 29th, Aug. (Wed.)
Place: 1-315, FIT Building
Organizer: Research Institute of Information Technology (RIIT), Tsinghua University
Chao Zhang received his B.E. and M.S. degrees in 2009 and 2012 respectively, both by studying accented Mandarin speech recognition at the department of computer science and technology at Tsinghua University. In 2017, he obtained his Ph.D. in Engineering from Cambridge University Engineering Department under the direction of Prof. Phil Woodland for the joint optimisation of multiple modules in a DNN based ASR system. He also created the DNN modules in HTK and released HTK version 3.5 while pursuing his Ph.D. Chao has published more than 30 papers in top speech conferences and journals, which include the best student papers from NCMMSC 2011 and ICASSP 2014, a best paper candidate from ASRU 2015, as well as papers received student grants and other awards. As a key member in the Cambridge team, he also attended a set of international ASR project evaluations and challenges, which include iARPA Babel 2013, DARPA BOLT 2014, and ASRU 2015 MGB, and had built the most important and the best performance systems.
Deep artificial neural network (ANN) models, such as the standard recurrent neural network (RNN), long short-term memory (LSTM) model, time-delay neural network (TDNN), and convolutional neural network (CNN), have become the essential components in automatic speech recognition (ASR) systems with the state-of-the-art performance. This talk covers a range of work on novel ANN architectures for acoustic modelling from Cambridge University, which includes solving the vanishing gradients with high order (Markov) RNNs, reducing the gating complexities in LSTM and highway networks by semi-tied units, improving TDNN performance with residual deep kernels and frequency convolutions provided by frequency dependent CNNs and Grid-RNNs, as well as an alternative output layer based on Gaussian mixture models that can integrate the traditional methods almost seamlessly. Some recent advances in model training and optimisation are also briefly discussed.