BERTOEIC: Solving TOEIC Problems Using Simple and Efficient Data Augmentation Techniques with Pretrained Transformer Encoders

Lee, Jeongwoo; Moon, Hyeonseok; Park, Chanjun; Seo, Jaehyung; Eo, Sugyeong; Lim, Heuiseok

doi:10.3390/app12136686

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

BERTOEIC: Solving TOEIC Problems Using Simple and Efficient Data Augmentation Techniques with Pretrained Transformer Encodersopen access

Authors: Lee, Jeongwoo; Moon, Hyeonseok; Park, Chanjun; Seo, Jaehyung; Eo, Sugyeong; Lim, Heuiseok

Issue Date: 7월-2022

Publisher: MDPI

Keywords: artificial intelligence; deep learning; natural language processing; machine reading comprehension; data augmentation

Citation: APPLIED SCIENCES-BASEL, v.12, no.13

Indexed: SCIE
SCOPUS

Journal Title: APPLIED SCIENCES-BASEL

Volume: 12

Number: 13

URI: https://scholar.korea.ac.kr/handle/2021.sw.korea/142933

DOI: 10.3390/app12136686

ISSN: 2076-3417

Abstract: Recent studies have attempted to understand natural language and infer answers. Machine reading comprehension is one of the representatives, and several related datasets have been opened. However, there are few official open datasets for the Test of English for International Communication (TOEIC), which is widely used for evaluating people's English proficiency, and research for further advancement is not being actively conducted. We consider that the reason why deep learning research for TOEIC is difficult is due to the data scarcity problem, so we therefore propose two data augmentation methods to improve the model in a low resource environment. Considering the attributes of the semantic and grammar problem type in TOEIC, the proposed methods can augment the data similar to the real TOEIC problem by using POS-tagging and Lemmatizing. In addition, we confirmed the importance of understanding semantics and grammar in TOEIC through experiments on each proposed methodology and experiments according to the amount of data. The proposed methods address the data shortage problem of TOEIC and enable an acceptable human-level performance.

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :9,531,197; Today View :24,456

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE