토픽모델링을 이용한 코퍼스의 주제구조 탐색Exploring the Thematic Structure in Corpora with Topic Modeling
- Other Titles
- Exploring the Thematic Structure in Corpora with Topic Modeling
- Authors
- 홍정하; 최재웅
- Issue Date
- 2017
- Publisher
- 서강대학교 언어정보연구소
- Keywords
- 토픽모델링; LDA알고리즘; 주제구조; 텍스트분류; 불용어; 비교클라우드; 주성분분석; 계통수도; topic modeling; LDA(latent Dirichlet allocation); thematic structure; text classification; stop word; comparison cloud; principal component analysis; phylogenetic tree
- Citation
- 언어와 정보 사회, v.30, pp.239 - 276
- Indexed
- KCI
- Journal Title
- 언어와 정보 사회
- Volume
- 30
- Start Page
- 239
- End Page
- 276
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/86015
- DOI
- 10.29211/soli.2017.30..009
- ISSN
- 1598-1886
- Abstract
- This paper aims to demonstrate the applicability of topic modeling, which can organize and summarize large archives of texts, from a corpus-linguistic perspective. To do this, we investigate thematic structures in the Brown Corpus uncovered by an R package which implements topic modeling based on LDA (latent Dirichlet allocation), and use statistical techniques such as comparison cloud, principal component analysis and phylogenetic tree to analyze and visualize the results effectively. This paper shows (i) that the Brown Corpus has a core thematic structure which is divided into texts representing the tendency of past tense and spoken language and texts representing the tendency of present tense and written language, (ii) that the former texts are mainly about women, home, and battle, and the latter texts are primarily related to humanities, society and the economy, and (iii) that the linguistic texts reveal the interdisciplinary nature related to mathematics and engineering, as well as humanities and social sciences.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Liberal Arts > Department of Linguistics > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.