The ASR Post-Processor Performance Challenges of BackTranScription (BTS): Data-Centric and Model-Centric Approaches
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Park, Chanjun | - |
dc.contributor.author | Seo, Jaehyung | - |
dc.contributor.author | Lee, Seolhwa | - |
dc.contributor.author | Lee, Chanhee | - |
dc.contributor.author | Lim, Heuiseok | - |
dc.date.accessioned | 2022-11-15T22:40:19Z | - |
dc.date.available | 2022-11-15T22:40:19Z | - |
dc.date.created | 2022-11-15 | - |
dc.date.issued | 2022-10 | - |
dc.identifier.issn | 2227-7390 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/145525 | - |
dc.description.abstract | Training an automatic speech recognition (ASR) post-processor based on sequence-to-sequence (S2S) requires a parallel pair (e.g., speech recognition result and human post-edited sentence) to construct the dataset, which demands a great amount of human labor. BackTransScription (BTS) proposes a data-building method to mitigate the limitations of the existing S2S based ASR post-processors, which can automatically generate vast amounts of training datasets, reducing time and cost in data construction. Despite the emergence of this novel approach, the BTS-based ASR post-processor still has research challenges and is mostly untested in diverse approaches. In this study, we highlight these challenges through detailed experiments by analyzing the data-centric approach (i.e., controlling the amount of data without model alteration) and the model-centric approach (i.e., model modification). In other words, we attempt to point out problems with the current trend of research pursuing a model-centric approach and alert against ignoring the importance of the data. Our experiment results show that the data-centric approach outperformed the model-centric approach by +11.69, +17.64, and +19.02 in the F1-score, BLEU, and GLEU tests. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | MDPI | - |
dc.title | The ASR Post-Processor Performance Challenges of BackTranScription (BTS): Data-Centric and Model-Centric Approaches | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Lim, Heuiseok | - |
dc.identifier.doi | 10.3390/math10193618 | - |
dc.identifier.scopusid | 2-s2.0-85139941211 | - |
dc.identifier.wosid | 000867915700001 | - |
dc.identifier.bibliographicCitation | MATHEMATICS, v.10, no.19 | - |
dc.relation.isPartOf | MATHEMATICS | - |
dc.citation.title | MATHEMATICS | - |
dc.citation.volume | 10 | - |
dc.citation.number | 19 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | Y | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Mathematics | - |
dc.relation.journalWebOfScienceCategory | Mathematics | - |
dc.subject.keywordAuthor | backtranscription | - |
dc.subject.keywordAuthor | machine translation | - |
dc.subject.keywordAuthor | data-centric | - |
dc.subject.keywordAuthor | model-centric | - |
dc.subject.keywordAuthor | automatic speech recognition | - |
dc.subject.keywordAuthor | post-processor | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.