Sample size planning for developing classifiers using high-dimensional DNA microarray data

被引：85

作者：

Dobbin, Kevin K. ^{[1
]}

Simon, Richard M. ^{[1
]}

机构：

[1] NCI, Biometr Res Branch, Rockville, MD 20852 USA

来源：

BIOSTATISTICS | 2007年 / 8卷 / 01期

关键词：

gene expression; microarrays; prediction; predictive inference; sample size;

D O I：

10.1093/biostatistics/kxj036

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Many gene expression studies attempt to develop a predictor of pre-defined diagnostic or prognostic classes. If the classes are similar biologically, then the number of genes that are differentially expressed between the classes is likely to be small compared to the total number of genes measured. This motivates a two-step process for predictor development, a subset of differentially expressed genes is selected for use in the predictor and then the predictor constructed from these. Both these steps will introduce variability into the resulting classifier, so both must be incorporated in sample size estimation. We introduce a methodology for sample size determination for prediction in the context of high-dimensional data that captures variability in both steps of predictor development. The methodology is based on a parametric probability model, but permits sample size computations to be carried out in a practical manner without extensive requirements for preliminary data. We find that many prediction problems do not require a large training set of arrays for classifier development.

引用

页码：101 / 117

页数：17

共 16 条

[1]

Carlin B. P., 2001, BAYES EMPIRICAL BAYE

[2] Sample size determination in microarray experiments for class comparison and prognostic classification [J].