Word-Level Quality Estimation for Korean-English Neural Machine Translation
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Eo, Sugyeong | - |
dc.contributor.author | Park, Chanjun | - |
dc.contributor.author | Moon, Hyeonseok | - |
dc.contributor.author | Seo, Jaehyung | - |
dc.contributor.author | Lim, Heuiseok | - |
dc.date.accessioned | 2022-06-12T16:41:04Z | - |
dc.date.available | 2022-06-12T16:41:04Z | - |
dc.date.created | 2022-06-09 | - |
dc.date.issued | 2022 | - |
dc.identifier.issn | 2169-3536 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/142159 | - |
dc.description.abstract | Quality estimation (QE) task aims to predict the machine translation (MT) quality well by referring to the source sentence and its MT output. The various applicability of QE proves the importance of QE research, but the enormous human labor to construct the QE dataset remains a challenge. This study proposes three automatic word-level pseudo-QE data construction strategies using a monolingual or parallel corpus and an external machine translator without human labor. We utilize these individual pseudo-QE datasets to finetune multilingual pretrained language models such as cross-lingual language models (XLM), XLM-RoBERTa, and multilingual BART and comparatively analyze the results. Considering the synthetic dataset creation setup, we attempt to validate the objectivity of the QE model by leveraging four test sets translated by external translators from Google, Amazon, Microsoft, and Systran. As a result, XLM-R-large shows the best performance among mPLMs. We also verify the reliability of the QE model through the close performance gaps between different test sets. To the best of our knowledge, this is the first study to experiment with word-level Korean-English QE. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | - |
dc.title | Word-Level Quality Estimation for Korean-English Neural Machine Translation | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Lim, Heuiseok | - |
dc.identifier.doi | 10.1109/ACCESS.2022.3169155 | - |
dc.identifier.scopusid | 2-s2.0-85129225412 | - |
dc.identifier.wosid | 000790767300001 | - |
dc.identifier.bibliographicCitation | IEEE ACCESS, v.10, pp.44964 - 44973 | - |
dc.relation.isPartOf | IEEE ACCESS | - |
dc.citation.title | IEEE ACCESS | - |
dc.citation.volume | 10 | - |
dc.citation.startPage | 44964 | - |
dc.citation.endPage | 44973 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | Y | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalResearchArea | Telecommunications | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.relation.journalWebOfScienceCategory | Telecommunications | - |
dc.subject.keywordAuthor | Predictive models | - |
dc.subject.keywordAuthor | Data models | - |
dc.subject.keywordAuthor | Feature extraction | - |
dc.subject.keywordAuthor | Task analysis | - |
dc.subject.keywordAuthor | Annotations | - |
dc.subject.keywordAuthor | Costs | - |
dc.subject.keywordAuthor | Machine translation | - |
dc.subject.keywordAuthor | Quality estimation | - |
dc.subject.keywordAuthor | neural machine translation | - |
dc.subject.keywordAuthor | multilingual pretrained language model | - |
dc.subject.keywordAuthor | natural language processing | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.