Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Practical approach to determine sample size for building logistic prediction models using high-throughput data

Full metadata record
DC Field Value Language
dc.contributor.authorSon, Dae-Soon-
dc.contributor.authorLee, DongHyuk-
dc.contributor.authorLee, Kyusang-
dc.contributor.authorJung, Sin-Ho-
dc.contributor.authorAhn, Taejin-
dc.contributor.authorLee, Eunjin-
dc.contributor.authorSohn, Insuk-
dc.contributor.authorChung, Jongsuk-
dc.contributor.authorPark, Woongyang-
dc.contributor.authorHuh, Nam-
dc.contributor.authorLee, Jae Won-
dc.date.accessioned2021-09-04T19:44:09Z-
dc.date.available2021-09-04T19:44:09Z-
dc.date.created2021-06-15-
dc.date.issued2015-02-
dc.identifier.issn1532-0464-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/94561-
dc.description.abstractAn empirical method of sample size determination for building prediction models was proposed recently. Permutation method which is used in this procedure is a commonly used method to address the problem of overfitting during cross-validation while evaluating the performance of prediction models constructed from microarray data. But major drawback of such methods which include bootstrapping and full permutations is prohibitively high cost of computation required for calculating the sample size. In this paper, we propose that a single representative null distribution can be used instead of a full permutation by using both simulated and real data sets. During simulation, we have used a dataset with zero effect size and confirmed that the empirical type I error approaches to 0.05. Hence this method can be confidently applied to reduce overfitting problem during cross-validation. We have observed that pilot data set generated by random sampling from real data could be successfully used for sample size determination. We present our results using an experiment that was repeated for 300 times while producing results comparable to that of full permutation method. Since we eliminate full permutation, sample size estimation time is not a function of pilot data size. In our experiment we have observed that this process takes around 30 min. With the increasing number of clinical studies, developing efficient sample size determination methods for building prediction models is critical. But empirical methods using bootstrap and permutation usually involve high computing costs. In this study, we propose a method that can reduce required computing time drastically by using representative null distribution of permutations. We use data from pilot experiments to apply this method for designing clinical studies efficiently for high throughput data. (C) 2014 Elsevier Inc. All rights reserved.-
dc.languageEnglish-
dc.language.isoen-
dc.publisherACADEMIC PRESS INC ELSEVIER SCIENCE-
dc.subjectFALSE DISCOVERY RATE-
dc.subjectLINEAR DISCRIMINANT-ANALYSIS-
dc.subjectNEGATIVE BREAST-CANCER-
dc.subjectHIGH-DIMENSIONAL DATA-
dc.subjectMICROARRAY EXPERIMENTS-
dc.subjectPOWER-
dc.titlePractical approach to determine sample size for building logistic prediction models using high-throughput data-
dc.typeArticle-
dc.contributor.affiliatedAuthorLee, Jae Won-
dc.identifier.doi10.1016/j.jbi.2014.12.010-
dc.identifier.scopusid2-s2.0-84924501368-
dc.identifier.wosid000351483600039-
dc.identifier.bibliographicCitationJOURNAL OF BIOMEDICAL INFORMATICS, v.53, pp.355 - 362-
dc.relation.isPartOfJOURNAL OF BIOMEDICAL INFORMATICS-
dc.citation.titleJOURNAL OF BIOMEDICAL INFORMATICS-
dc.citation.volume53-
dc.citation.startPage355-
dc.citation.endPage362-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaMedical Informatics-
dc.relation.journalWebOfScienceCategoryComputer Science, Interdisciplinary Applications-
dc.relation.journalWebOfScienceCategoryMedical Informatics-
dc.subject.keywordPlusFALSE DISCOVERY RATE-
dc.subject.keywordPlusLINEAR DISCRIMINANT-ANALYSIS-
dc.subject.keywordPlusNEGATIVE BREAST-CANCER-
dc.subject.keywordPlusHIGH-DIMENSIONAL DATA-
dc.subject.keywordPlusMICROARRAY EXPERIMENTS-
dc.subject.keywordPlusPOWER-
dc.subject.keywordAuthorSample size-
dc.subject.keywordAuthorStatistical power-
dc.subject.keywordAuthorPrediction and validation-
dc.subject.keywordAuthorPermutation-
dc.subject.keywordAuthorNull distribution-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Political Science & Economics > Department of Statistics > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher LEE, JAE WON photo

LEE, JAE WON
College of Political Science & Economics (Department of Statistics)
Read more

Altmetrics

Total Views & Downloads

BROWSE