Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Hyperparameter experiments on end-to-end automatic speech recognition

Full metadata record
DC Field Value Language
dc.contributor.author양형원-
dc.contributor.author남호성-
dc.date.accessioned2022-03-06T09:40:34Z-
dc.date.available2022-03-06T09:40:34Z-
dc.date.created2022-02-10-
dc.date.issued2021-
dc.identifier.issn2005-8063-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/137968-
dc.description.abstractEnd-to-end (E2E) automatic speech recognition (ASR) has achieved promising performance gains with the introduced self-attention network, Transformer. However, due to training time and the number of hyperparameters, finding the optimal hyperparameter set is computationally expensive. This paper investigates the impact of hyperparameters in the Transformer network to answer two questions: which hyperparameter plays a critical role in the task performance and training speed. The Transformer network for training has two encoder and decoder networks combined with Connectionist Temporal Classification (CTC). We have trained the model with Wall Street Journal (WSJ) SI-284 and tested on devl93 and eval92. Seventeen hyperparameters were selected from the ESPnet training configuration, and varying ranges of values were used for experiments. The result shows that “num blocks” and “linear units” hyperparameters in the encoder and decoder networks reduce Word Error Rate (WER) significantly. However, performance gain is more prominent when they are altered in the encoder network. Training duration also linearly increased as “num blocks” and “linear units” hyperparameters’ values grow. Based on the experimental results, we collected the optimal values from each hyperparameter and reduced the WER up to 2.9/1.9 from dev93 and eval93 respectively.-
dc.languageEnglish-
dc.language.isoen-
dc.publisher한국음성학회-
dc.titleHyperparameter experiments on end-to-end automatic speech recognition-
dc.title.alternativeHyperparameter experiments on end-to-end automatic speech recognition-
dc.typeArticle-
dc.contributor.affiliatedAuthor남호성-
dc.identifier.bibliographicCitation말소리와 음성과학, v.13, no.1, pp.45 - 51-
dc.relation.isPartOf말소리와 음성과학-
dc.citation.title말소리와 음성과학-
dc.citation.volume13-
dc.citation.number1-
dc.citation.startPage45-
dc.citation.endPage51-
dc.type.rimsART-
dc.identifier.kciidART002699005-
dc.description.journalClass2-
dc.description.journalRegisteredClasskci-
dc.subject.keywordAuthorautomatic speech recognition-
dc.subject.keywordAuthorhyperparameters-
dc.subject.keywordAuthorneural network-
dc.subject.keywordAuthoroptimization-
dc.subject.keywordAuthortransformer-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Liberal Arts > Department of English Language and Literature > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Nam, Ho sung photo

Nam, Ho sung
문과대학 (영어영문학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE