Orthogonal Gradient Penalty for Fast Training of Wasserstein GAN Based Multi-Task Autoencoder toward Robust Speech Recognition
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kao, Chao-Yuan | - |
dc.contributor.author | Park, Sangwook | - |
dc.contributor.author | Badi, Alzahra | - |
dc.contributor.author | Han, David K. | - |
dc.contributor.author | Ko, Hanseok | - |
dc.date.accessioned | 2021-08-31T01:11:23Z | - |
dc.date.available | 2021-08-31T01:11:23Z | - |
dc.date.created | 2021-06-19 | - |
dc.date.issued | 2020-05 | - |
dc.identifier.issn | 1745-1361 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/56123 | - |
dc.description.abstract | Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG | - |
dc.title | Orthogonal Gradient Penalty for Fast Training of Wasserstein GAN Based Multi-Task Autoencoder toward Robust Speech Recognition | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Ko, Hanseok | - |
dc.identifier.doi | 10.1587/transinf.2019EDL8183 | - |
dc.identifier.scopusid | 2-s2.0-85084854925 | - |
dc.identifier.wosid | 000530668200034 | - |
dc.identifier.bibliographicCitation | IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, v.E103D, no.5, pp.1195 - 1198 | - |
dc.relation.isPartOf | IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | - |
dc.citation.title | IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | - |
dc.citation.volume | E103D | - |
dc.citation.number | 5 | - |
dc.citation.startPage | 1195 | - |
dc.citation.endPage | 1198 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Software Engineering | - |
dc.subject.keywordAuthor | speech enhancement | - |
dc.subject.keywordAuthor | generative adversarial networks | - |
dc.subject.keywordAuthor | deep learning | - |
dc.subject.keywordAuthor | robust speech recognition | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea+82-2-3290-2963
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.