Multiple-testing strategy for analyzing cDNA array data on gene expression

被引:43
作者
Delongchamp, RR [1 ]
Bowyer, JF
Chen, JJ
Kodell, RL
机构
[1] Natl Ctr Toxicol Res, Div Biometry & Risk Assessment, Jefferson, AR 72079 USA
[2] Natl Ctr Toxicol Res, Div Neurotoxicol, Jefferson, AR 72079 USA
关键词
decision theory; false discovery rate; false nondiscovery rate; p-value plot; ROC curves; subset selection;
D O I
10.1111/j.0006-341X.2004.00228.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
An objective of many functional genomics studies is to estimate treatment-induced changes in gene expression. cDNA arrays interrogate each tissue sample for the levels of mRNA for hundreds to tens of thousands of genes, and the use of this technology leads to a multitude of treatment contrasts. By-gene hypotheses tests evaluate the evidence supporting no effect, but selecting a significance level requires dealing with the multitude of comparisons. The p-values from these tests order the genes such that a p-value cutoff divides the genes into two sets. Ideally one set would contain the affected genes and the other would contain the unaffected genes. However, the set of genes selected as affected will have false positives, i.e., genes that are not affected by treatment. Likewise, the other set of genes, selected as unaffected, will contain false negatives, i.e., genes that are affected. A plot of the observed p-values (1 - p) versus their expectation tinder a uniform [0, 1] distribution allows one to estimate the number of true null hypotheses. With this estimate, the false positive rates and false negative rates associated with any p-value cutoff can be estimated. When computed for a range of cutoffs, these rates summarize the ability of the study to resolve effects. In our work, we are more interested in selecting most of the affected genes rather than protecting against a few false positives. An optimum cutoff, i.e., the best set given the data, depends upon the relative cost of falsely classifying a gene as affected versus the cost of falsely classifying a gene as unaffected. We select the cutoff by a decision-theoretic method analogous to methods developed for receiver operating characteristic curves. In addition, we estimate the false discovery rate and the false nondiscovery rate associated with any cutoff value. Two functional genomics studies that were designed to assess a treatment effect are used to illustrate how the methods allowed the investigators to determine a cutoff to suit their research goals.
引用
收藏
页码:774 / 782
页数:9
相关论文
共 23 条
[1]   A mixture model approach for the analysis of microarray gene expression data [J].
Allison, DB ;
Gadbury, GL ;
Heo, MS ;
Fernández, JR ;
Lee, CK ;
Prolla, TA ;
Weindruch, R .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 39 (01) :1-20
[2]  
[Anonymous], 1998, Practical nonparametric statistics
[3]   On the adaptive control of the false discovery fate in multiple testing with independent statistics [J].
Benjamini, Y ;
Hochberg, Y .
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2000, 25 (01) :60-83
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Selective changes in gene expression in cortical regions sensitive to amphetamine during the Neurodegenerative process [J].
Bowyer, JF ;
Harris, AJ ;
Delongchamp, RR ;
Jakab, RL ;
Miller, DB ;
Little, AR ;
O'Callaghan, JP .
NEUROTOXICOLOGY, 2004, 25 (04) :555-572
[6]  
Chen Yi-Ju, 2003, J Biopharm Stat, V13, P57, DOI 10.1081/BIP-120017726
[7]   Gene expression data: The technology and statistical analysis [J].
Craig, BA ;
Black, MA ;
Doerge, RW .
JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2003, 8 (01) :1-28
[8]   A statistical approach in using cDNA array analysis to determine modest changes in gene expression in several brain regions after neurotoxic insult [J].
Delongchamp, RR ;
Harris, AJ ;
Bowyer, JF .
NEUROPROTECTIVE AGENTS, 2003, 993 :363-376
[9]   A SHARPER BONFERRONI PROCEDURE FOR MULTIPLE TESTS OF SIGNIFICANCE [J].
HOCHBERG, Y .
BIOMETRIKA, 1988, 75 (04) :800-802
[10]   MORE POWERFUL PROCEDURES FOR MULTIPLE SIGNIFICANCE TESTING [J].
HOCHBERG, Y ;
BENJAMINI, Y .
STATISTICS IN MEDICINE, 1990, 9 (07) :811-818