구어체 적응 사전 학습을 통한 한국어 감정 분류 성능 향상Improving Korean Emotion Classification via Colloquial-Adaptive Pretraining
- Other Titles
- Improving Korean Emotion Classification via Colloquial-Adaptive Pretraining
- Authors
- 이정훈; 김동화; 노영빈; 강필성
- Issue Date
- 2021
- Publisher
- 대한산업공학회
- Keywords
- Natural Language Processing; Transfer Learning; Adaptive Pretraining; Multi-Emotion Classification
- Citation
- 대한산업공학회지, v.47, no.4, pp.342 - 350
- Indexed
- KCI
- Journal Title
- 대한산업공학회지
- Volume
- 47
- Number
- 4
- Start Page
- 342
- End Page
- 350
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/144771
- ISSN
- 1225-0988
- Abstract
- Language models (LMs) pretrained on a large text corpus and fine-tuned on a task data have a remarkable performance for document classification task. Recently, an adaptive pretraining method that re-pretrains the pretrained LMs using an additional dataset in the same domain with the given task to make up the domain discrepancy has reported significant performance improvements. However, current adaptive pretraining methods only focus on the domain gap between pretraining data and fine-tuning data. The writing style is also different because the pretraining data, e.g., Wikipedia, is written in a literary style, but the task data, e.g., customer review, is usually written in a colloquial style. In this work, we propose a colloquial-adaptive pretraining method that re-pretrains the pretrained LM with informal sentences to generalize the LM to colloquial style. We verify the proposed method based on multi-emotion classification datasets. The experimental results show that the proposed method yields improved classification performance on both low- and high-resource data.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Engineering > School of Industrial and Management Engineering > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.