【2018學術報告04】A Bayesian Approach to Deep Neural Network Adaptation with Applications to Robust Automatic Speech Recognition
Title: A Bayesian Approach to Deep Neural Network Adaptation with Applications to Robust Automatic Speech Recognition
Speaker: Chin-Hui Lee, Professor, School of Electrical and Computer Engineering, Georgia Institute of Technology
Time: 10:00-12:00, 28th, May. (Mon.)
Place: 1-315, FIT Building
Organizer: Research Institute of Information Technology (RIIT), Tsinghua University
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 500 papers and patents, with more than 30,000 citations and an h-index of 75 on Google Scholar. He received numerous awards, including the Bell Labs President's Gold Award in 1998. He won the SPS's 2006 Technical Achievement Award for ``Exceptional Contributions to the Field of Automatic Speech Recognition''. In 2012 he gave an ICASSP plenary talk on the future of automatic speech recognition. In the same year he was awarded the ISCA Medal in scientific achievement for ``pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition''.
The discriminative nature of deep neural networks (DNNs) makes adaptation using a small amount of data for a large number of DNN parameters quite challenging. This is also known as a catastrophic forgetting problem in DNN-based transfer learning. In this talk, we formulate a Bayesian framework to address this problem while maintaining its satisfactory theoretical properties. Leveraging upon the successes of Bayesian adaptation in GMM-HMM, we propose two completely different Bayesian formulations for DNN-HMM, called direct and indirect DNN adaptation. The former adds a prior term to any DNN-based learning objective function, and the latter utilizes a bottleneck layer to learn a GMM for each shared tied state at the outputs of a DNN. Tested on the WSJ and Switchboard tasks, we found that both MAP and structural MAP (SMAP) for speaker adaptation improves performances over the already-good speaker independent systems.