Janne Pylkkönen is defending his PhD thesis "Towards Efficient and Robust Automatic Speech Recognition: Decoding Techniques and Discriminative Training" on Friday, 22nd of March, 2013 at Aalto University School of Science. The research has been conducted in a research group focusing on speech technology, lead by Prof. Mikko Kurimo. Dr. Erik McDermott, Google, is serving as the opponent and Prof. Erkki Oja as the custos. The research on speech recognition has long roots in Otaniemi as Academician Teuvo Kohonen conducted active research in this area already in early 1980s and developed the famous neural phonetic typewriter with his team.
Pylkkönen's thesis presents methods for decoding and modeling the acoustics for large vocabulary continuous speech recognition with two main contributions. First, he has developed a large vocabulary decoder suitable especially for morphologically rich languages. Second, he has improved discriminative training of acoustic models to increase their robustness. The thesis also includes a theoretical analysis of discriminative training where the extended Baum-Welch algorithm is formulated as a constrained optimization method. The methods have been tested using a speech recognition system that has been developed over the years with contributions from a number of researchers.
In his opening statement, Erik McDermott first told that he works in the speech division at Google that develops speech recognition capabilities of the Android phones. He mentioned that Android has speech recognition for 40 languages including Finnish. He reminded of the challenges related to speech recognition. There are still fundamental problems with the technology and therefore active research is still needed. McDermott recognized the important contributions by Teuvo Kohonen and Erkki Oja in providing what he called an organic view to pattern recognition systems. In essence, this refers to the contributions related to unsupervised machine learning where systems improve over time based on a data-driven approach.
In the discussion, several methodological themes considered in detail related to decoding techniques, potential use of finite-state transducers, pruning techniques, discriminative training, maximum likelihood modeling and Gaussian mixtures.