Correlated variable importance for random forests
- Authors
- Shin, Seung Beom; Cho, Hyung Jun
- Issue Date
- 4월-2021
- Publisher
- KOREAN STATISTICAL SOC
- Keywords
- correlation; random forests; variable importance
- Citation
- KOREAN JOURNAL OF APPLIED STATISTICS, v.34, no.2, pp.177 - 190
- Indexed
- KCI
- Journal Title
- KOREAN JOURNAL OF APPLIED STATISTICS
- Volume
- 34
- Number
- 2
- Start Page
- 177
- End Page
- 190
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/137712
- DOI
- 10.5351/KJAS.2021.34.2.177
- ISSN
- 1225-066X
- Abstract
- Random forests is a popular method that improves the instability and accuracy of decision trees by ensembles. In contrast to increasing the accuracy, the ease of interpretation is sacrificed; hence, to compensate for this, variable importance is provided. The variable importance indicates which variable plays a role more importantly in constructing the random forests. However, when a predictor is correlated with other predictors, the variable importance of the existing importance algorithm may be distorted. The downward bias of correlated predictors may reduce the importance of truly important predictors. We propose a new algorithm remedying the downward bias of correlated predictors. The performance of the proposed algorithm is demonstrated by the simulated data and illustrated by the real data.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Political Science & Economics > Department of Statistics > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.