Relative power and sample size analysis on gene expression profiling data

被引:56
作者
van Iterson, M. [1 ,2 ]
't Hoen, P. A. C. [1 ]
Pedotti, P. [1 ]
Hooiveld, G. J. E. J. [3 ,4 ]
den Dunnen, J. T. [1 ,5 ]
van Ommen, G. J. B. [1 ]
Boer, J. M. [1 ,2 ]
Menezes, R. X. [1 ,2 ,6 ,7 ]
机构
[1] Leiden Univ, Med Ctr, Ctr Human & Clin Genet, Leiden, Netherlands
[2] Netherlands Bioinformat Ctr, Nijmegen, Netherlands
[3] TI Food & Nutr, Nutrigenom Consortium, Wageningen, Netherlands
[4] Wageningen Univ, Div Human Nutr, Nutr Metab & Genom Grp, Wageningen, Netherlands
[5] Leiden Univ, Med Ctr, Leiden Genome Technol Ctr, Leiden, Netherlands
[6] Sophia Childrens Univ Hosp, Pediat Lab, Erasmus Med Ctr, Rotterdam, Netherlands
[7] Vrije Univ Amsterdam Med Ctr, Dept Epidemiol & Biostat, Amsterdam, Netherlands
来源
BMC GENOMICS | 2009年 / 10卷
关键词
FALSE DISCOVERY RATE; MICROARRAY;
D O I
10.1186/1471-2164-10-439
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: With the increasing number of expression profiling technologies, researchers today are confronted with choosing the technology that has sufficient power with minimal sample size, in order to reduce cost and time. These depend on data variability, partly determined by sample type, preparation and processing. Objective measures that help experimental design, given own pilot data, are thus fundamental. Results: Relative power and sample size analysis were performed on two distinct data sets. The first set consisted of Affymetrix array data derived from a nutrigenomics experiment in which weak, intermediate and strong PPAR alpha agonists were administered to wild-type and PPAR alpha-null mice. Our analysis confirms the hierarchy of PPAR alpha-activating compounds previously reported and the general idea that larger effect sizes positively contribute to the average power of the experiment. A simulation experiment was performed that mimicked the effect sizes seen in the first data set. The relative power was predicted but the estimates were slightly conservative. The second, more challenging, data set describes a microarray platform comparison study using hippocampal delta C-doublecortin-like kinase transgenic mice that were compared to wild-type mice, which was combined with results from Solexa/Illumina deep sequencing runs. As expected, the choice of technology greatly influences the performance of the experiment. Solexa/Illumina deep sequencing has the highest overall power followed by the microarray platforms Agilent and Affymetrix. Interestingly, Solexa/Illumina deep sequencing displays comparable power across all intensity ranges, in contrast with microarray platforms that have decreased power in the low intensity range due to background noise. This means that deep sequencing technology is especially more powerful in detecting differences in the low intensity range, compared to microarray platforms. Conclusion: Power and sample size analysis based on pilot data give valuable information on the performance of the experiment and can thereby guide further decisions on experimental design. Solexa/Illumina deep sequencing is the technology of choice if interest lies in genes expressed in the low-intensity range. Researchers can get guidance on experimental design using our approach on their own pilot data implemented as a BioConductor package, SSPA http://bioconductor.org/packages/release/bioc/html/SSPA.html.
引用
收藏
页数:10
相关论文
共 28 条
[1]  
[Anonymous], 2007, R LANG ENV STAT COMP
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   Adaptive linear step-up procedures that control the false discovery rate [J].
Benjamini, Yoav ;
Krieger, Abba M. ;
Yekutieli, Daniel .
BIOMETRIKA, 2006, 93 (03) :491-507
[4]   Genome-wide analysis of PPARα activation in murine small intestine [J].
Bunger, Meike ;
van den Bosch, Heleen M. ;
van der Meijde, Jolanda ;
Kersten, Sander ;
Hooiveld, Guido J. E. J. ;
Muller, Michael .
PHYSIOLOGICAL GENOMICS, 2007, 30 (02) :192-204
[5]   Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data [J].
Dai, MH ;
Wang, PL ;
Boyd, AD ;
Kostov, G ;
Athey, B ;
Jones, EG ;
Bunney, WE ;
Myers, RM ;
Speed, TP ;
Akil, H ;
Watson, SJ ;
Meng, F .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :e175.1-e175.9
[6]  
Deepayan S, 2008, LATTICE MULTIVARIATE
[7]  
Desvergne B., 2006, Endocr Rev, V20, P649
[8]  
FERREIRA J, 2006, INT J BIOSTAT, V21, P8
[9]  
FERREIRA JA, 2006, STAT APPL GENET MOL, V5
[10]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)