The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

被引:184
作者
Shi, Leming [1 ]
Jones, Wendell D. [2 ]
Jensen, Roderick V. [3 ]
Harris, Stephen C. [1 ]
Perkins, Roger G. [4 ]
Goodsaid, Federico M. [5 ]
Guo, Lei [1 ]
Croner, Lisa J. [6 ]
Boysen, Cecilie [7 ]
Fang, Hong [4 ]
Qian, Feng [4 ]
Amur, Shashi [5 ]
Bao, Wenjun [8 ]
Barbacioru, Catalin C. [9 ]
Bertholet, Vincent [10 ]
Cao, Xiaoxi Megan [4 ]
Chu, Tzu-Ming [8 ]
Collins, Patrick J. [11 ]
Fan, Xiaohui [1 ,12 ]
Frueh, Felix W. [5 ]
Fuscoe, James C. [1 ]
Guo, Xu [13 ]
Han, Jing [14 ]
Herman, Damir [15 ]
Hong, Huixiao [4 ]
Kawasaki, Ernest S. [16 ]
Li, Quan-Zhen [17 ]
Luo, Yuling [18 ]
Ma, Yunqing [18 ]
Mei, Nan [1 ]
Peterson, Ron L. [19 ]
Puri, Raj K. [14 ]
Shippy, Richard [20 ]
Su, Zhenqiang [1 ]
Sun, Yongming Andrew [9 ]
Sun, Hongmei [4 ]
Thorn, Brett [4 ]
Turpaz, Yaron [12 ]
Wang, Charles [21 ]
Wang, Sue Jane [5 ]
Warrington, Janet A. [13 ]
Willey, James C. [22 ]
Wu, Jie [4 ]
Xie, Qian [4 ]
Zhang, Liang [23 ]
Zhang, Lu [24 ]
Zhong, Sheng [25 ]
Wolfinger, Russell D. [8 ]
Tong, Weida [1 ]
机构
[1] US FDA, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
[2] Express Anal Inc, Durham, NC 27713 USA
[3] Univ Massachusetts, Dept Phys, Boston, MA 02125 USA
[4] US FDA, Z Tech Corp, NCTR, Jefferson, AR 72079 USA
[5] US FDA, Ctr Drug Evaluat & Res, Silver Spring, MD 20993 USA
[6] Biogen Idec Inc, San Diego, CA 92122 USA
[7] ViaLogy Inc, Altadena, CA 91001 USA
[8] SAS Inst Inc, Cary, NC 27513 USA
[9] Applied Biosyst, Foster City, CA 94404 USA
[10] Eppendorf Array Technol, B-5000 Namur, Belgium
[11] Agilent Technol, Santa Clara, CA 95051 USA
[12] Zhejiang Univ, Pharmaceut Informat Inst, Hangzhou 310027, Peoples R China
[13] Affymetrix Inc, Santa Clara, CA 95051 USA
[14] US FDA, Ctr Biol Evaluat & Res, Bethesda, MD 20892 USA
[15] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
[16] Natl Canc Inst, Adv Technol Ctr, Gaithersburg, MD 20877 USA
[17] Univ Texas SW Med Ctr Dallas, Dallas, TX 75390 USA
[18] Panomics Inc, Fremont, CA 94555 USA
[19] Novartis Inst Biomed Res, Cambridge, MA 02139 USA
[20] GE Healthcare, Tempe, AZ 85284 USA
[21] Univ Calif Los Angeles, David Geffen Sch Med, Cedars Sinai Med Ctr, Los Angeles, CA 90048 USA
[22] Ohio Med Univ, Toledo, OH 43614 USA
[23] CapitalBio Corp, Beijing 102206, Peoples R China
[24] Solexa Inc, Hayward, CA 94545 USA
[25] Univ Illinois, Dept Bioengn, Urbana, IL 61801 USA
关键词
D O I
10.1186/1471-2105-9-S9-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. Results: Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan - the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. Conclusion: We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.
引用
收藏
页数:19
相关论文
共 56 条
  • [51] The gene expression fingerprint of human heart failure
    Tan, FL
    Moravec, CS
    Li, JB
    Apperson-Hansen, C
    McCarthy, PM
    Young, JB
    Bond, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (17) : 11387 - 11392
  • [52] Evaluation of gene expression measurements from commercial microarray platforms
    Tan, PK
    Downey, TJ
    Spitznagel, EL
    Xu, P
    Fu, D
    Dimitrov, DS
    Lempicki, RA
    Raaka, BM
    Cam, MC
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (19) : 5676 - 5684
  • [53] Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
    Tan, YX
    Shi, LM
    Tong, WD
    Wang, C
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (01) : 56 - 65
  • [54] Significance analysis of microarrays applied to the ionizing radiation response
    Tusher, VG
    Tibshirani, R
    Chu, G
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (09) : 5116 - 5121
  • [55] Large scale real-time PCR validation on gene expression measurements from two commercial long-oligonucleotide microarrays
    Wang, YL
    Barbacioru, C
    Hyland, F
    Xiao, WM
    Hunkapiller, KL
    Blake, J
    Chan, F
    Gonzalez, C
    Zhang, L
    Samaha, RR
    [J]. BMC GENOMICS, 2006, 7 (1)
  • [56] Genome-wide analysis of spatial gene expression in Arabidopsis flowers
    Wellmer, F
    Riechmann, JL
    Alves-Ferreira, M
    Meyerowitz, EM
    [J]. PLANT CELL, 2004, 16 (05) : 1314 - 1326