Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance

Full metadata record
DC Field Value Language
dc.contributor.authorKao, Chao Yuan-
dc.contributor.authorKo, Hanseok-
dc.date.accessioned2021-09-01T01:20:39Z-
dc.date.available2021-09-01T01:20:39Z-
dc.date.created2021-06-19-
dc.date.issued2019-11-
dc.identifier.issn1225-4428-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/62044-
dc.description.abstractAs the presence of background noise in acoustic signal degrades the performance of speech or acoustic event recognition, it is still challenging to extract noise-robust acoustic features from noisy signal. In this paper, we propose a combined structure of Wasserstein Generative Adversarial Network (WGAN) and Multi-Task AutoEncoder (MTAE) as deep learning architecture that integrates the strength of MTAE and WGAN respectively such that it estimates not only noise but also speech features from noisy acoustic source. The proposed MTAE-WGAN structure is used to estimate speech signal and the residual noise by employing a gradient penalty and a weight initialization method for Leaky Rectified Linear Unit (LReLU) and Parametric ReLU (PReLU). The proposed MTAE-WGAN structure with the adopted gradient penalty loss function enhances the speech features and subsequently achieve substantial Phoneme Error Rate (PER) improvements over the stand-alone Deep Denoising Autoencoder (DDAE), MTAE, Redundant Convolutional Encoder-Decoder (R-CED) and Recurrent MTAE (RMTAE) models for robust speech recognition.-
dc.languageEnglish-
dc.language.isoen-
dc.publisherACOUSTICAL SOC KOREA-
dc.titleCombining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance-
dc.typeArticle-
dc.contributor.affiliatedAuthorKo, Hanseok-
dc.identifier.doi10.7776/ASK.2019.38.6.670-
dc.identifier.scopusid2-s2.0-85079175884-
dc.identifier.wosid000502020100006-
dc.identifier.bibliographicCitationJOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, v.38, no.6, pp.670 - 677-
dc.relation.isPartOfJOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA-
dc.citation.titleJOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA-
dc.citation.volume38-
dc.citation.number6-
dc.citation.startPage670-
dc.citation.endPage677-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.identifier.kciidART002527290-
dc.description.journalClass1-
dc.description.journalRegisteredClassscopus-
dc.description.journalRegisteredClasskci-
dc.relation.journalResearchAreaAcoustics-
dc.relation.journalWebOfScienceCategoryAcoustics-
dc.subject.keywordAuthorSpeech enhancement-
dc.subject.keywordAuthorWasserstein Generative Adversarial Network (WGAN)-
dc.subject.keywordAuthorWeight initialization-
dc.subject.keywordAuthorRobust speech recognition-
dc.subject.keywordAuthorDeep Neural Network (DNN)-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Engineering > School of Electrical Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Ko, Han seok photo

Ko, Han seok
공과대학 (전기전자공학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE