Hyperparameter experiments on end-to-end  automatic speech recognition

양형원; 남호성

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Hyperparameter experiments on end-to-end automatic speech recognition

Full metadata record

DC Field	Value	Language
dc.contributor.author	양형원	-
dc.contributor.author	남호성	-
dc.date.accessioned	2022-03-06T09:40:34Z	-
dc.date.available	2022-03-06T09:40:34Z	-
dc.date.created	2022-02-10	-
dc.date.issued	2021	-
dc.identifier.issn	2005-8063	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/137968	-
dc.description.abstract	End-to-end (E2E) automatic speech recognition (ASR) has achieved promising performance gains with the introduced self-attention network, Transformer. However, due to training time and the number of hyperparameters, finding the optimal hyperparameter set is computationally expensive. This paper investigates the impact of hyperparameters in the Transformer network to answer two questions: which hyperparameter plays a critical role in the task performance and training speed. The Transformer network for training has two encoder and decoder networks combined with Connectionist Temporal Classification (CTC). We have trained the model with Wall Street Journal (WSJ) SI-284 and tested on devl93 and eval92. Seventeen hyperparameters were selected from the ESPnet training configuration, and varying ranges of values were used for experiments. The result shows that “num blocks” and “linear units” hyperparameters in the encoder and decoder networks reduce Word Error Rate (WER) significantly. However, performance gain is more prominent when they are altered in the encoder network. Training duration also linearly increased as “num blocks” and “linear units” hyperparameters’ values grow. Based on the experimental results, we collected the optimal values from each hyperparameter and reduced the WER up to 2.9/1.9 from dev93 and eval93 respectively.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	한국음성학회	-
dc.title	Hyperparameter experiments on end-to-end automatic speech recognition	-
dc.title.alternative	Hyperparameter experiments on end-to-end automatic speech recognition	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	남호성	-
dc.identifier.bibliographicCitation	말소리와 음성과학, v.13, no.1, pp.45 - 51	-
dc.relation.isPartOf	말소리와 음성과학	-
dc.citation.title	말소리와 음성과학	-
dc.citation.volume	13	-
dc.citation.number	1	-
dc.citation.startPage	45	-
dc.citation.endPage	51	-
dc.type.rims	ART	-
dc.identifier.kciid	ART002699005	-
dc.description.journalClass	2	-
dc.description.journalRegisteredClass	kci	-
dc.subject.keywordAuthor	automatic speech recognition	-
dc.subject.keywordAuthor	hyperparameters	-
dc.subject.keywordAuthor	neural network	-
dc.subject.keywordAuthor	optimization	-
dc.subject.keywordAuthor	transformer	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Liberal Arts > Department of English Language and Literature > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Nam, Ho sung photo

Nam, Ho sung: 문과대학 (영어영문학과)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,800,125; Today View :17,104

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE