Controlling false-negative errors in microarray differential expression analysis: a PRIM approach

被引:93
作者
Cole, SW
Galic, Z
Zack, JA
机构
[1] Univ Calif Los Angeles, David Geffen Sch Med, Dept Med, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, David Geffen Sch Med, Dept Microbiol Immunol & Mol Genet, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, AIDS Inst, Los Angeles, CA 90095 USA
关键词
D O I
10.1093/bioinformatics/btg242
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Theoretical considerations suggest that current microarray screening algorithms may fail to detect many true differences in gene expression (Type 11 analytic errors). We assessed 'false negative' error rates in differential expression analyses by conventional linear statistical models (e.g. t-test), microarray-adapted variants (e.g. SAM, Cyber-T), and a novel strategy based on hold-out cross-validation. The latter approach employs the machine-learning algorithm Patient Rule Induction Method (PRIM) to infer minimum thresholds for reliable change in gene expression from Boolean conjunctions of fold-induction and raw fluorescence measurements. Results: Monte Carlo analyses based on four empirical data sets show that conventional statistical models and their microarray-adapted variants overlook more than 50% of genes showing significant up-regulation. Conjoint PRIM prediction rules recover approximately twice as many differentially expressed transcripts while maintaining strong control over false-positive (Type 1) errors. As a result, experimental replication rates increase and total analytic error rates decline. RT-PCR studies confirm that gene inductions detected by PRIM but overlooked by other methods represent true changes in mRNA levels. PRIM-based conjoint inference rules thus represent an improved strategy for high-sensitivity screening of DNA microarrays.
引用
收藏
页码:1808 / 1816
页数:9
相关论文
共 27 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]  
[Anonymous], 1983, Statistical methods
[3]  
[Anonymous], 1991, GEN LINEAR MODELS
[4]  
Bratley P., 1983, GUIDE SIMULATION
[5]   Cooperation of multiple signaling pathways in CD40-regulated gene expression in B lymphocytes [J].
Dadgostar, H ;
Zarnegar, B ;
Hoffmann, A ;
Qin, XF ;
Truong, U ;
Rao, G ;
Baltimore, D ;
Cheng, GH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (03) :1497-1502
[6]  
DUDOIT S, 2000, STAT METHODS IDENTIF
[7]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[8]   Bump hunting in high-dimensional data [J].
Friedman J.H. ;
Fisher N.I. .
Statistics and Computing, 1999, 9 (2) :123-143
[9]   REPRESENTATIONS OF QUALITATIVE AND QUANTITATIVE DIMENSIONS [J].
GATI, I ;
TVERSKY, A .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1982, 8 (02) :325-340
[10]   Genome-wide expression analysis reveals dysregulation of myelination-related genes in chronic schizophrenia [J].
Hakak, Y ;
Walker, JR ;
Li, C ;
Wong, WH ;
Davis, KL ;
Buxbaum, JD ;
Haroutunian, V ;
Fienberg, AA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (08) :4746-4751