Significance and statistical errors in the analysis of DNA microarray data

被引:74
作者
Brody, JP
Williams, BA
Wold, BJ
Quake, SR
机构
[1] CALTECH, Dept Biol, Pasadena, CA 91125 USA
[2] CALTECH, Dept Appl Phys, Pasadena, CA 91125 USA
关键词
D O I
10.1073/pnas.162468199
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA microarrays are important devices for high throughput measurements of gene expression, but no rational foundation has been established for understanding the sources of within-chip statistical error. We designed a specialized chip and protocol to investigate the distribution and magnitude of within-chip errors and discovered that, as expected from theoretical expectations, measurement errors follow a Lorentzian-like distribution, which explains the widely observed but unexplained ill-reproducibility in microarray data. Using this specially designed chip, we examined a data set of repeated measurements to extract estimates of the distribution and magnitude of statistical errors in DNA microarray measurements. Using the common "ratio of medians" method, we find that the measurements follow a Lorentzian-like distribution, which is problematic for subsequent analysis. We show that a method of analysis dubbed "median of ratios" yields a more Gaussian-like distribution of errors. Finally, we show that the bootstrap algorithm can be used to extract the best estimates of the error in the measurement. Quantifying the statistical error in such measurements has important applications for estimating significance levels, clustering algorithms, and process optimization.
引用
收藏
页码:12975 / 12978
页数:4
相关论文
共 9 条
[1]   STATISTICAL-DATA ANALYSIS IN THE COMPUTER-AGE [J].
EFRON, B ;
TIBSHIRANI, R .
SCIENCE, 1991, 253 (5018) :390-395
[2]  
Eisen MB, 1999, METHOD ENZYMOL, V303, P179
[3]  
HINKLEY DV, 1969, BIOMETRIKA, V56, P635, DOI 10.1093/biomet/56.3.635
[4]   Multivariate measurement of gene expression relationships [J].
Kim, SC ;
Dougherty, ER ;
Chen, YD ;
Sivakumar, K ;
Meltzer, P ;
Trent, JM ;
Bittner, M .
GENOMICS, 2000, 67 (02) :201-209
[5]   Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations [J].
Lee, MLT ;
Kuo, FC ;
Whitmore, GA ;
Sklar, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :9834-9839
[6]   QUANTITATIVE MONITORING OF GENE-EXPRESSION PATTERNS WITH A COMPLEMENTARY-DNA MICROARRAY [J].
SCHENA, M ;
SHALON, D ;
DAVIS, RW ;
BROWN, PO .
SCIENCE, 1995, 270 (5235) :467-470
[7]   Systematic determination of genetic network architecture [J].
Tavazoie, S ;
Hughes, JD ;
Campbell, MJ ;
Cho, RJ ;
Church, GM .
NATURE GENETICS, 1999, 22 (03) :281-285
[8]   Significance analysis of microarrays applied to the ionizing radiation response [J].
Tusher, VG ;
Tibshirani, R ;
Chu, G .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (09) :5116-5121
[9]   Prediction of gene function by genome-scale expression analysis: Prostate cancer-associated genes [J].
Walker, MG ;
Volkmuth, W ;
Sprinzak, E ;
Hodgson, D ;
Klingler, T .
GENOME RESEARCH, 1999, 9 (12) :1198-1203