Practical approach to determine sample size for building logistic prediction models using high-throughput data

Son, Dae-Soon; Lee, DongHyuk; Lee, Kyusang; Jung, Sin-Ho; Ahn, Taejin; Lee, Eunjin; Sohn, Insuk; Chung, Jongsuk; Park, Woongyang; Huh, Nam; Lee, Jae Won

doi:10.1016/j.jbi.2014.12.010

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Practical approach to determine sample size for building logistic prediction models using high-throughput data

Full metadata record

DC Field	Value	Language
dc.contributor.author	Son, Dae-Soon	-
dc.contributor.author	Lee, DongHyuk	-
dc.contributor.author	Lee, Kyusang	-
dc.contributor.author	Jung, Sin-Ho	-
dc.contributor.author	Ahn, Taejin	-
dc.contributor.author	Lee, Eunjin	-
dc.contributor.author	Sohn, Insuk	-
dc.contributor.author	Chung, Jongsuk	-
dc.contributor.author	Park, Woongyang	-
dc.contributor.author	Huh, Nam	-
dc.contributor.author	Lee, Jae Won	-
dc.date.accessioned	2021-09-04T19:44:09Z	-
dc.date.available	2021-09-04T19:44:09Z	-
dc.date.created	2021-06-15	-
dc.date.issued	2015-02	-
dc.identifier.issn	1532-0464	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/94561	-
dc.description.abstract	An empirical method of sample size determination for building prediction models was proposed recently. Permutation method which is used in this procedure is a commonly used method to address the problem of overfitting during cross-validation while evaluating the performance of prediction models constructed from microarray data. But major drawback of such methods which include bootstrapping and full permutations is prohibitively high cost of computation required for calculating the sample size. In this paper, we propose that a single representative null distribution can be used instead of a full permutation by using both simulated and real data sets. During simulation, we have used a dataset with zero effect size and confirmed that the empirical type I error approaches to 0.05. Hence this method can be confidently applied to reduce overfitting problem during cross-validation. We have observed that pilot data set generated by random sampling from real data could be successfully used for sample size determination. We present our results using an experiment that was repeated for 300 times while producing results comparable to that of full permutation method. Since we eliminate full permutation, sample size estimation time is not a function of pilot data size. In our experiment we have observed that this process takes around 30 min. With the increasing number of clinical studies, developing efficient sample size determination methods for building prediction models is critical. But empirical methods using bootstrap and permutation usually involve high computing costs. In this study, we propose a method that can reduce required computing time drastically by using representative null distribution of permutations. We use data from pilot experiments to apply this method for designing clinical studies efficiently for high throughput data. (C) 2014 Elsevier Inc. All rights reserved.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	ACADEMIC PRESS INC ELSEVIER SCIENCE	-
dc.subject	FALSE DISCOVERY RATE	-
dc.subject	LINEAR DISCRIMINANT-ANALYSIS	-
dc.subject	NEGATIVE BREAST-CANCER	-
dc.subject	HIGH-DIMENSIONAL DATA	-
dc.subject	MICROARRAY EXPERIMENTS	-
dc.subject	POWER	-
dc.title	Practical approach to determine sample size for building logistic prediction models using high-throughput data	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Lee, Jae Won	-
dc.identifier.doi	10.1016/j.jbi.2014.12.010	-
dc.identifier.scopusid	2-s2.0-84924501368	-
dc.identifier.wosid	000351483600039	-
dc.identifier.bibliographicCitation	JOURNAL OF BIOMEDICAL INFORMATICS, v.53, pp.355 - 362	-
dc.relation.isPartOf	JOURNAL OF BIOMEDICAL INFORMATICS	-
dc.citation.title	JOURNAL OF BIOMEDICAL INFORMATICS	-
dc.citation.volume	53	-
dc.citation.startPage	355	-
dc.citation.endPage	362	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.description.journalClass	1	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Medical Informatics	-
dc.relation.journalWebOfScienceCategory	Computer Science, Interdisciplinary Applications	-
dc.relation.journalWebOfScienceCategory	Medical Informatics	-
dc.subject.keywordPlus	FALSE DISCOVERY RATE	-
dc.subject.keywordPlus	LINEAR DISCRIMINANT-ANALYSIS	-
dc.subject.keywordPlus	NEGATIVE BREAST-CANCER	-
dc.subject.keywordPlus	HIGH-DIMENSIONAL DATA	-
dc.subject.keywordPlus	MICROARRAY EXPERIMENTS	-
dc.subject.keywordPlus	POWER	-
dc.subject.keywordAuthor	Sample size	-
dc.subject.keywordAuthor	Statistical power	-
dc.subject.keywordAuthor	Prediction and validation	-
dc.subject.keywordAuthor	Permutation	-
dc.subject.keywordAuthor	Null distribution	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Political Science & Economics > Department of Statistics > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher LEE, JAE WON photo

LEE, JAE WON: College of Political Science & Economics (Department of Statistics)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,964,977; Today View :471

RSS_1.0 RSS_2.0 ATOM_1.0

145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea+82-2-3290-2963

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE