언어의 공기관계 분석을 위한 임의화검증의 응용Applying Randomization Tests to Collocation Analyses in Large Corpora
- Other Titles
- Applying Randomization Tests to Collocation Analyses in Large Corpora
- Authors
- 양경숙; 김희영
- Issue Date
- 2005
- Publisher
- 한국통계학회
- Keywords
- Co-occurrence; Collocation; Association; Chi-square statistic; Mutual
information; Co-occurrence; Collocation; Association; Chi-square statistic; Mutual
information
- Citation
- 응용통계연구, v.18, no.3, pp.583 - 595
- Indexed
- KCI
- Journal Title
- 응용통계연구
- Volume
- 18
- Number
- 3
- Start Page
- 583
- End Page
- 595
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/126068
- ISSN
- 1225-066X
- Abstract
- Contingency tables are used to compare counts of n-grams to determine if the n-gram is a true collocation, meaning that the words that make up the n-gram are highlyassociated in the text.Some statistical methods for identifying collocation are used. They are Kulczinskycoecient, Ochiai coecient, Frager and McGowan coecient, Yule coecient, mutualinformation, and chi-square, and so on.But the main problem is that these measures are based on the assumption of a nor-mal or approximately normal distribution of the variables being sampled. While thisassumption is valid in most instances, it is not valid when comparing the rates ofoccurrence of rare events, and texts are composed mostly of rare events.In this paper we have simply reviewed some statistics about testing association oftwo words. Some randomization tests to evaluate the signicance level in analyzing
collocation in large corpora are proposed. A related graph can be used to compare
dierent test statistics that can be used to analyze the same contingency table.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Public Policy > Division of Big Data Science > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.