Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance

Kao, Chao Yuan; Ko, Hanseok

doi:10.7776/ASK.2019.38.6.670

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kao, Chao Yuan	-
dc.contributor.author	Ko, Hanseok	-
dc.date.accessioned	2021-09-01T01:20:39Z	-
dc.date.available	2021-09-01T01:20:39Z	-
dc.date.created	2021-06-19	-
dc.date.issued	2019-11	-
dc.identifier.issn	1225-4428	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/62044	-
dc.description.abstract	As the presence of background noise in acoustic signal degrades the performance of speech or acoustic event recognition, it is still challenging to extract noise-robust acoustic features from noisy signal. In this paper, we propose a combined structure of Wasserstein Generative Adversarial Network (WGAN) and Multi-Task AutoEncoder (MTAE) as deep learning architecture that integrates the strength of MTAE and WGAN respectively such that it estimates not only noise but also speech features from noisy acoustic source. The proposed MTAE-WGAN structure is used to estimate speech signal and the residual noise by employing a gradient penalty and a weight initialization method for Leaky Rectified Linear Unit (LReLU) and Parametric ReLU (PReLU). The proposed MTAE-WGAN structure with the adopted gradient penalty loss function enhances the speech features and subsequently achieve substantial Phoneme Error Rate (PER) improvements over the stand-alone Deep Denoising Autoencoder (DDAE), MTAE, Redundant Convolutional Encoder-Decoder (R-CED) and Recurrent MTAE (RMTAE) models for robust speech recognition.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	ACOUSTICAL SOC KOREA	-
dc.title	Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Ko, Hanseok	-
dc.identifier.doi	10.7776/ASK.2019.38.6.670	-
dc.identifier.scopusid	2-s2.0-85079175884	-
dc.identifier.wosid	000502020100006	-
dc.identifier.bibliographicCitation	JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, v.38, no.6, pp.670 - 677	-
dc.relation.isPartOf	JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA	-
dc.citation.title	JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA	-
dc.citation.volume	38	-
dc.citation.number	6	-
dc.citation.startPage	670	-
dc.citation.endPage	677	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.identifier.kciid	ART002527290	-
dc.description.journalClass	1	-
dc.description.journalRegisteredClass	scopus	-
dc.description.journalRegisteredClass	kci	-
dc.relation.journalResearchArea	Acoustics	-
dc.relation.journalWebOfScienceCategory	Acoustics	-
dc.subject.keywordAuthor	Speech enhancement	-
dc.subject.keywordAuthor	Wasserstein Generative Adversarial Network (WGAN)	-
dc.subject.keywordAuthor	Weight initialization	-
dc.subject.keywordAuthor	Robust speech recognition	-
dc.subject.keywordAuthor	Deep Neural Network (DNN)	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > School of Electrical Engineering > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Ko, Han seok photo

Ko, Han seok: 공과대학 (전기전자공학부)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,766,977; Today View :8,896

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE