The ASR Post-Processor Performance Challenges of BackTranScription (BTS): Data-Centric and Model-Centric Approachesopen access
- Authors
- Park, Chanjun; Seo, Jaehyung; Lee, Seolhwa; Lee, Chanhee; Lim, Heuiseok
- Issue Date
- 10월-2022
- Publisher
- MDPI
- Keywords
- backtranscription; machine translation; data-centric; model-centric; automatic speech recognition; post-processor
- Citation
- MATHEMATICS, v.10, no.19
- Indexed
- SCIE
SCOPUS
- Journal Title
- MATHEMATICS
- Volume
- 10
- Number
- 19
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/145525
- DOI
- 10.3390/math10193618
- ISSN
- 2227-7390
- Abstract
- Training an automatic speech recognition (ASR) post-processor based on sequence-to-sequence (S2S) requires a parallel pair (e.g., speech recognition result and human post-edited sentence) to construct the dataset, which demands a great amount of human labor. BackTransScription (BTS) proposes a data-building method to mitigate the limitations of the existing S2S based ASR post-processors, which can automatically generate vast amounts of training datasets, reducing time and cost in data construction. Despite the emergence of this novel approach, the BTS-based ASR post-processor still has research challenges and is mostly untested in diverse approaches. In this study, we highlight these challenges through detailed experiments by analyzing the data-centric approach (i.e., controlling the amount of data without model alteration) and the model-centric approach (i.e., model modification). In other words, we attempt to point out problems with the current trend of research pursuing a model-centric approach and alert against ignoring the importance of the data. Our experiment results show that the data-centric approach outperformed the model-centric approach by +11.69, +17.64, and +19.02 in the F1-score, BLEU, and GLEU tests.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Graduate School > Department of Computer Science and Engineering > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.