Orthogonal Gradient Penalty for Fast Training of Wasserstein GAN Based Multi-Task Autoencoder toward Robust Speech Recognition
- Authors
- Kao, Chao-Yuan; Park, Sangwook; Badi, Alzahra; Han, David K.; Ko, Hanseok
- Issue Date
- 5월-2020
- Publisher
- IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG
- Keywords
- speech enhancement; generative adversarial networks; deep learning; robust speech recognition
- Citation
- IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, v.E103D, no.5, pp.1195 - 1198
- Indexed
- SCIE
SCOPUS
- Journal Title
- IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
- Volume
- E103D
- Number
- 5
- Start Page
- 1195
- End Page
- 1198
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/56123
- DOI
- 10.1587/transinf.2019EDL8183
- ISSN
- 1745-1361
- Abstract
- Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Engineering > School of Electrical Engineering > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.