대규모 신문 기사의 자동 키워드 추출과 분석 -t-점수를 이용하여-
- Authors
- 김일환; 이도길
- Issue Date
- 2011
- Publisher
- 한국어학회
- Keywords
- 키워드(keyword); 키워드성(keywordness); 키워드 추출(extraction of keyword); 사용 빈도(frequency of use); t-점수(t-score); [물결 21] 코퍼스(Trends21 corpus; 신문 기사(newspaper)
- Citation
- 한국어학, v.53, pp.145 - 194
- Indexed
- KCI
- Journal Title
- 한국어학
- Volume
- 53
- Start Page
- 145
- End Page
- 194
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/113705
- ISSN
- 1226-9123
- Abstract
- Kim, Ilhwan & Lee, Do-Gil. 2011. 11. Automatic Keyword Extraction and Analysis from the Large Scale Newspaper Corpus Based on t-score. Korean Linguistics 53,145-194. As the type and size of documents radically increased in recent years, how to automatically extract proper keywords from those documents has also been important. This paper aims to propose an automatic method to extract keywords and to analyze their characteristics. The keywords are extracted from Trends 21 corpus, a collection of four major Korean daily newspapers issued from the year 2000 to 2009.
We introduce t-score to measure the keywordness. The keywords were extracted from two aspects i.e. year and topic. We present the top 100 keywords for 6 topics and 10years. Also, to verify whether these keywords can be representatives of the texts, we compared them with the headline news of 2009. The two main contributions of this work are as follows: 1) this study can present keywords which are automatically extracted from large scaled corpora without any human intervention by the verifiable and objective method and 2) this study analyzed the characteristics of the keywords by topic and year.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Associate Research Center > Research Institute of Korean Studies > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.