Selecting differentially expressed genes from microarray experiments

被引:144
作者
Pepe, MS [1 ]
Longton, G
Anderson, GL
Schummer, M
机构
[1] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[2] Fred Hutchinson Canc Res Ctr, Div Publ Hlth Sci, Seattle, WA 98109 USA
[3] Inst Syst Biol, Seattle, WA 98105 USA
关键词
classification; discrimination; exploratory analysis; genomics; prediction; proteomics; ROC curves;
D O I
10.1111/1541-0420.00016
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
High throughput technologies, such as gene expression arrays and protein mass spectrometry, allow one to simultaneously evaluate thousands of potential biomarkers that could distinguish different tissue types. Of particular interest here is distinguishing between cancerous and normal organ tissues. We consider statistical methods to rank genes (or proteins) in regards to differential expression between tissues. Various statistical measures are considered, and we argue that two measures related to the Receiver Operating Characteristic Curve are particularly suitable for this purpose. We also propose that sampling variability in the gene rankings be quantified, and suggest using the "selection probability function," the probability distribution of rankings for each gene. This is estimated via the bootstrap. A real dataset, derived from gene expression arrays of 23 normal and 30 ovarian cancer tissues, is analyzed. Simulation studies are also used to assess the relative performance of different statistical gene ranking measures and our quantification of sampling variability. Our approach leads naturally to a procedure for sample-size calculations, appropriate for exploratory studies that seek to identify differentially expressed genes.
引用
收藏
页码:133 / 142
页数:10
相关论文
共 23 条
[1]   AREA ABOVE ORDINAL DOMINANCE GRAPH AND AREA BELOW RECEIVER OPERATING CHARACTERISTIC GRAPH [J].
BAMBER, D .
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1975, 12 (04) :387-415
[2]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[3]  
Dudoit S, 2002, STAT SINICA, V12, P111
[4]  
Efron B., 1993, INTRO BOOTSTRAP, V1st ed., DOI DOI 10.1201/9780429246593
[5]  
EFRON B, 2000, 213 STANF U DIV BIOS
[6]  
HASTIE T, 2000, GENE SHAVING NEW CLA
[7]   Violin plots: A box plot-density trace synergism [J].
Hintze, JL ;
Nelson, RD .
AMERICAN STATISTICIAN, 1998, 52 (02) :181-184
[8]   Conserved expression of hepatocyte growth factor activator inhibitor type-2/placental bikunin in human colorectal carcinomas [J].
Kataoka, H ;
Itoh, H ;
Uchino, H ;
Hamasuna, R ;
Kitamura, N ;
Nabeshima, K ;
Koono, M .
CANCER LETTERS, 2000, 148 (02) :127-134
[9]   Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments [J].
Kerr, MK ;
Churchill, GA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (16) :8961-8965
[10]  
Lazzeroni L, 2002, STAT SINICA, V12, P61