Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kao, Chao Yuan | - |
dc.contributor.author | Ko, Hanseok | - |
dc.date.accessioned | 2021-09-01T01:20:39Z | - |
dc.date.available | 2021-09-01T01:20:39Z | - |
dc.date.created | 2021-06-19 | - |
dc.date.issued | 2019-11 | - |
dc.identifier.issn | 1225-4428 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/62044 | - |
dc.description.abstract | As the presence of background noise in acoustic signal degrades the performance of speech or acoustic event recognition, it is still challenging to extract noise-robust acoustic features from noisy signal. In this paper, we propose a combined structure of Wasserstein Generative Adversarial Network (WGAN) and Multi-Task AutoEncoder (MTAE) as deep learning architecture that integrates the strength of MTAE and WGAN respectively such that it estimates not only noise but also speech features from noisy acoustic source. The proposed MTAE-WGAN structure is used to estimate speech signal and the residual noise by employing a gradient penalty and a weight initialization method for Leaky Rectified Linear Unit (LReLU) and Parametric ReLU (PReLU). The proposed MTAE-WGAN structure with the adopted gradient penalty loss function enhances the speech features and subsequently achieve substantial Phoneme Error Rate (PER) improvements over the stand-alone Deep Denoising Autoencoder (DDAE), MTAE, Redundant Convolutional Encoder-Decoder (R-CED) and Recurrent MTAE (RMTAE) models for robust speech recognition. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | ACOUSTICAL SOC KOREA | - |
dc.title | Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Ko, Hanseok | - |
dc.identifier.doi | 10.7776/ASK.2019.38.6.670 | - |
dc.identifier.scopusid | 2-s2.0-85079175884 | - |
dc.identifier.wosid | 000502020100006 | - |
dc.identifier.bibliographicCitation | JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, v.38, no.6, pp.670 - 677 | - |
dc.relation.isPartOf | JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA | - |
dc.citation.title | JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA | - |
dc.citation.volume | 38 | - |
dc.citation.number | 6 | - |
dc.citation.startPage | 670 | - |
dc.citation.endPage | 677 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.identifier.kciid | ART002527290 | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scopus | - |
dc.description.journalRegisteredClass | kci | - |
dc.relation.journalResearchArea | Acoustics | - |
dc.relation.journalWebOfScienceCategory | Acoustics | - |
dc.subject.keywordAuthor | Speech enhancement | - |
dc.subject.keywordAuthor | Wasserstein Generative Adversarial Network (WGAN) | - |
dc.subject.keywordAuthor | Weight initialization | - |
dc.subject.keywordAuthor | Robust speech recognition | - |
dc.subject.keywordAuthor | Deep Neural Network (DNN) | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.