Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Authorship attribution of Korean texts by using phrase patterns

Authors
Lee, J.Choe, J.-W.Jin, M.
Issue Date
2017
Publisher
International Information Institute Ltd.
Keywords
Authorship attribution; Classification; Korean; Machine learning
Citation
Information (Japan), v.20, no.1, pp.417 - 428
Indexed
SCOPUS
Journal Title
Information (Japan)
Volume
20
Number
1
Start Page
417
End Page
428
URI
https://scholar.korea.ac.kr/handle/2021.sw.korea/86091
ISSN
1343-4500
Abstract
When we adopt machine learning for text classification, the process is exactly the same as the usual machine learning process; however, a textual classification requires its own unique pre-process in order to derive feature datasets. Therefore, proposals for new feature datasets are an important research topic in stylometrics. In this paper, we propose to use phrase patterns to establish a feature dataset for authorship attribution of Korean texts. The main idea of phrase patterns is to synthesize the tag and BUNSETSU(Jin, 2013). Although there are large similarities between Korean and Japanese, for example, the order and structure of sentences as well as the role of JYOSHI, there is no research on phrase patterns of Korean texts to date. We focused on the fact that GOSETSU('Eojeol' in Korean) plays the same role as BUNSETSU in Japanese. The corpus comprised 160 editorials from 4 authors and 200 essays from 10 authors. As a result of a simulation study with 5 feature datasets and 5 classifiers, the use of phrase patterns showed a good performance with respect to the authorship attribution in Korean texts. Moreover, there was no statistically significant mean difference between the use of phrase and the non-phrase patterns as input datasets for four of the five classifiers. Out of the five classifiers, RF yielded the highest accuracy, which is consistent with previous research studies on Japanese texts. A noticeable result is that LMT showed better performance than that of SVM. © 2017 International Information Institute.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Liberal Arts > Department of Linguistics > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE