한국어 단어 및 문장 분류 태스크를 위한 분절 전략의 효과성 연구

김진성; 김경민; 손준영; 박정배; 임희석

doi:10.15207/JKCS.2021.12.12.039

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

한국어 단어 및 문장 분류 태스크를 위한 분절 전략의 효과성 연구

Full metadata record

DC Field	Value	Language
dc.contributor.author	김진성	-
dc.contributor.author	김경민	-
dc.contributor.author	손준영	-
dc.contributor.author	박정배	-
dc.contributor.author	임희석	-
dc.date.accessioned	2022-03-11T14:41:01Z	-
dc.date.available	2022-03-11T14:41:01Z	-
dc.date.created	2022-01-20	-
dc.date.issued	2021	-
dc.identifier.issn	2233-4890	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/138598	-
dc.description.abstract	효과적인 분절을 통한 양질의 입력 자질 구성은 언어모델의 문장 이해력을 향상하기 위한 필수적인 단계이다. 입력 자질의 품질 제고는 세부 태스크의 성능과 직결된다. 본 논문은 단어와 문장 분류 관점에서 한국어의 언어적 특징을 효과적으로 반영하는 분절 전략을 비교 연구한다. 분절 유형은 언어학적 단위에 따라 어절, 형태소, 음절, 자모 네 가지로 분류하며, RoBERTa 모델 구조를 활용하여 사전학습을 진행한다. 각 세부 태스크를 분류 단위에 따라 문장 분류 그룹과 단어 분류 그룹으로 구분 지어 실험함으로써, 그룹 내 경향성 및 그룹 간 차이에 대한 분석을 진행한다. 실험 결과에 따르면, 문장 분류에서는 자모 단위의 언어학적 분절 전략을 적용한 모델이 타 분절 전략 대비 최대 NSMC: +0.62%, KorNLI: +2.38%, KorSTS: +2.41% 높은 성능을, 단어 분류에서는 음절 단위의 분절 전략이 최대 NER: +0.7%, SRL: +0.61% 높은 성능을 보임으로써, 각 분류 그룹에서의 효과성을 보여준다.	-
dc.language	Korean	-
dc.language.iso	ko	-
dc.publisher	한국융합학회	-
dc.title	한국어 단어 및 문장 분류 태스크를 위한 분절 전략의 효과성 연구	-
dc.title.alternative	A Comparative study on the Effectiveness of Segmentation Strategies for Korean Word and Sentence Classification tasks	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	임희석	-
dc.identifier.doi	10.15207/JKCS.2021.12.12.039	-
dc.identifier.bibliographicCitation	한국융합학회논문지, v.12, no.12, pp.39 - 47	-
dc.relation.isPartOf	한국융합학회논문지	-
dc.citation.title	한국융합학회논문지	-
dc.citation.volume	12	-
dc.citation.number	12	-
dc.citation.startPage	39	-
dc.citation.endPage	47	-
dc.type.rims	ART	-
dc.identifier.kciid	ART002787030	-
dc.description.journalClass	2	-
dc.description.journalRegisteredClass	kci	-
dc.subject.keywordAuthor	Linguistic segmentation	-
dc.subject.keywordAuthor	Natural language processing	-
dc.subject.keywordAuthor	Pre-trained language model	-
dc.subject.keywordAuthor	Sentence classification	-
dc.subject.keywordAuthor	Tokenization	-
dc.subject.keywordAuthor	Word classification	-
dc.subject.keywordAuthor	단어 분류	-
dc.subject.keywordAuthor	문장 분류	-
dc.subject.keywordAuthor	사전학습 언어모델	-
dc.subject.keywordAuthor	언어학적 분절	-
dc.subject.keywordAuthor	자연어 처리	-
dc.subject.keywordAuthor	토큰화	-

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,397,019; Today View :7,784

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE