Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models

Lee, Chanhee; Yang, Kisu; Whang, Taesun; Park, Chanjun; Matteson, Andrew; Lim, Heuiseok

doi:10.3390/app11051974

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models

Authors: Lee, Chanhee; Yang, Kisu; Whang, Taesun; Park, Chanjun; Matteson, Andrew; Lim, Heuiseok

Issue Date: Mar-2021

Publisher: MDPI

Keywords: cross-lingual; pretraining; language model; transfer learning; deep learning; RoBERTa

Citation: APPLIED SCIENCES-BASEL, v.11, no.5

Indexed: SCIE
SCOPUS

Journal Title: APPLIED SCIENCES-BASEL

Volume: 11

Number: 5

URI: https://scholar.korea.ac.kr/handle/2021.sw.korea/128496

DOI: 10.3390/app11051974

ISSN: 2076-3417

Abstract: Language model pretraining is an effective method for improving the performance of downstream natural language processing tasks. Even though language modeling is unsupervised and thus collecting data for it is relatively less expensive, it is still a challenging process for languages with limited resources. This results in great technological disparity between high- and low-resource languages for numerous downstream natural language processing tasks. In this paper, we aim to make this technology more accessible by enabling data efficient training of pretrained language models. It is achieved by formulating language modeling of low-resource languages as a domain adaptation task using transformer-based language models pretrained on corpora of high-resource languages. Our novel cross-lingual post-training approach selectively reuses parameters of the language model trained on a high-resource language and post-trains them while learning language-specific parameters in the low-resource language. We also propose implicit translation layers that can learn linguistic differences between languages at a sequence level. To evaluate our method, we post-train a RoBERTa model pretrained in English and conduct a case study for the Korean language. Quantitative results from intrinsic and extrinsic evaluations show that our method outperforms several massively multilingual and monolingual pretrained language models in most settings and improves the data efficiency by a factor of up to 32 compared to monolingual training.

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :7,232,023; Today View :24,331

RSS_1.0 RSS_2.0 ATOM_1.0

145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea+82-2-3290-2963

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE