토픽모델링을 이용한 코퍼스의 주제구조 탐색
DC Field | Value | Language |
---|---|---|
dc.contributor.author | 홍정하 | - |
dc.contributor.author | 최재웅 | - |
dc.date.accessioned | 2021-09-03T13:54:47Z | - |
dc.date.available | 2021-09-03T13:54:47Z | - |
dc.date.created | 2021-06-17 | - |
dc.date.issued | 2017 | - |
dc.identifier.issn | 1598-1886 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/86015 | - |
dc.description.abstract | This paper aims to demonstrate the applicability of topic modeling, which can organize and summarize large archives of texts, from a corpus-linguistic perspective. To do this, we investigate thematic structures in the Brown Corpus uncovered by an R package which implements topic modeling based on LDA (latent Dirichlet allocation), and use statistical techniques such as comparison cloud, principal component analysis and phylogenetic tree to analyze and visualize the results effectively. This paper shows (i) that the Brown Corpus has a core thematic structure which is divided into texts representing the tendency of past tense and spoken language and texts representing the tendency of present tense and written language, (ii) that the former texts are mainly about women, home, and battle, and the latter texts are primarily related to humanities, society and the economy, and (iii) that the linguistic texts reveal the interdisciplinary nature related to mathematics and engineering, as well as humanities and social sciences. | - |
dc.language | Korean | - |
dc.language.iso | ko | - |
dc.publisher | 서강대학교 언어정보연구소 | - |
dc.title | 토픽모델링을 이용한 코퍼스의 주제구조 탐색 | - |
dc.title.alternative | Exploring the Thematic Structure in Corpora with Topic Modeling | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | 최재웅 | - |
dc.identifier.doi | 10.29211/soli.2017.30..009 | - |
dc.identifier.bibliographicCitation | 언어와 정보 사회, v.30, pp.239 - 276 | - |
dc.relation.isPartOf | 언어와 정보 사회 | - |
dc.citation.title | 언어와 정보 사회 | - |
dc.citation.volume | 30 | - |
dc.citation.startPage | 239 | - |
dc.citation.endPage | 276 | - |
dc.type.rims | ART | - |
dc.identifier.kciid | ART002214526 | - |
dc.description.journalClass | 2 | - |
dc.description.journalRegisteredClass | kci | - |
dc.subject.keywordAuthor | 토픽모델링 | - |
dc.subject.keywordAuthor | LDA알고리즘 | - |
dc.subject.keywordAuthor | 주제구조 | - |
dc.subject.keywordAuthor | 텍스트분류 | - |
dc.subject.keywordAuthor | 불용어 | - |
dc.subject.keywordAuthor | 비교클라우드 | - |
dc.subject.keywordAuthor | 주성분분석 | - |
dc.subject.keywordAuthor | 계통수도 | - |
dc.subject.keywordAuthor | topic modeling | - |
dc.subject.keywordAuthor | LDA(latent Dirichlet allocation) | - |
dc.subject.keywordAuthor | thematic structure | - |
dc.subject.keywordAuthor | text classification | - |
dc.subject.keywordAuthor | stop word | - |
dc.subject.keywordAuthor | comparison cloud | - |
dc.subject.keywordAuthor | principal component analysis | - |
dc.subject.keywordAuthor | phylogenetic tree | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.