Decoding Strategies for Improving Low-Resource Machine Translation

Park, Chanjun; Yang, Yeongwook; Park, Kinam; Lim, Heuiseok

doi:10.3390/electronics9101562

Detailed Information

Cited 3 time in webofscience

Cited 2 time in scopus

Metadata Downloads

Decoding Strategies for Improving Low-Resource Machine Translation

Authors: Park, Chanjun; Yang, Yeongwook; Park, Kinam; Lim, Heuiseok

Issue Date: 10월-2020

Publisher: MDPI

Keywords: neural machine translation; Korean& #8211; English neural machine translation; transformer; efficiency processing; post-processing; decoding strategies

Citation: ELECTRONICS, v.9, no.10

Indexed: SCIE
SCOPUS

Journal Title: ELECTRONICS

Volume: 9

Number: 10

URI: https://scholar.korea.ac.kr/handle/2021.sw.korea/52651

DOI: 10.3390/electronics9101562

ISSN: 2079-9292

Abstract: Pre-processing and post-processing are significant aspects of natural language processing (NLP) application software. Pre-processing in neural machine translation (NMT) includes subword tokenization to alleviate the problem of unknown words, parallel corpus filtering that only filters data suitable for training, and data augmentation to ensure that the corpus contains sufficient content. Post-processing includes automatic post editing and the application of various strategies during decoding in the translation process. Most recent NLP researches are based on the Pretrain-Finetuning Approach (PFA). However, when small and medium-sized organizations with insufficient hardware attempt to provide NLP services, throughput and memory problems often occur. These difficulties increase when utilizing PFA to process low-resource languages, as PFA requires large amounts of data, and the data for low-resource languages are often insufficient. Utilizing the current research premise that NMT model performance can be enhanced through various pre-processing and post-processing strategies without changing the model, we applied various decoding strategies to Korean-English NMT, which relies on a low-resource language pair. Through comparative experiments, we proved that translation performance could be enhanced without changes to the model. We experimentally examined how performance changed in response to beam size changes and n-gram blocking, and whether performance was enhanced when a length penalty was applied. The results showed that various decoding strategies enhance the performance and compare well with previous Korean-English NMT approaches. Therefore, the proposed methodology can improve the performance of NMT models, without the use of PFA; this presents a new perspective for improving machine translation performance.

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :9,545,265; Today View :8,871

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE