Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance
- Authors
- Kao, Chao Yuan; Ko, Hanseok
- Issue Date
- 11월-2019
- Publisher
- ACOUSTICAL SOC KOREA
- Keywords
- Speech enhancement; Wasserstein Generative Adversarial Network (WGAN); Weight initialization; Robust speech recognition; Deep Neural Network (DNN)
- Citation
- JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, v.38, no.6, pp.670 - 677
- Indexed
- SCOPUS
KCI
- Journal Title
- JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA
- Volume
- 38
- Number
- 6
- Start Page
- 670
- End Page
- 677
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/62044
- DOI
- 10.7776/ASK.2019.38.6.670
- ISSN
- 1225-4428
- Abstract
- As the presence of background noise in acoustic signal degrades the performance of speech or acoustic event recognition, it is still challenging to extract noise-robust acoustic features from noisy signal. In this paper, we propose a combined structure of Wasserstein Generative Adversarial Network (WGAN) and Multi-Task AutoEncoder (MTAE) as deep learning architecture that integrates the strength of MTAE and WGAN respectively such that it estimates not only noise but also speech features from noisy acoustic source. The proposed MTAE-WGAN structure is used to estimate speech signal and the residual noise by employing a gradient penalty and a weight initialization method for Leaky Rectified Linear Unit (LReLU) and Parametric ReLU (PReLU). The proposed MTAE-WGAN structure with the adopted gradient penalty loss function enhances the speech features and subsequently achieve substantial Phoneme Error Rate (PER) improvements over the stand-alone Deep Denoising Autoencoder (DDAE), MTAE, Redundant Convolutional Encoder-Decoder (R-CED) and Recurrent MTAE (RMTAE) models for robust speech recognition.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Engineering > School of Electrical Engineering > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.