The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

被引:184
作者
Shi, Leming [1 ]
Jones, Wendell D. [2 ]
Jensen, Roderick V. [3 ]
Harris, Stephen C. [1 ]
Perkins, Roger G. [4 ]
Goodsaid, Federico M. [5 ]
Guo, Lei [1 ]
Croner, Lisa J. [6 ]
Boysen, Cecilie [7 ]
Fang, Hong [4 ]
Qian, Feng [4 ]
Amur, Shashi [5 ]
Bao, Wenjun [8 ]
Barbacioru, Catalin C. [9 ]
Bertholet, Vincent [10 ]
Cao, Xiaoxi Megan [4 ]
Chu, Tzu-Ming [8 ]
Collins, Patrick J. [11 ]
Fan, Xiaohui [1 ,12 ]
Frueh, Felix W. [5 ]
Fuscoe, James C. [1 ]
Guo, Xu [13 ]
Han, Jing [14 ]
Herman, Damir [15 ]
Hong, Huixiao [4 ]
Kawasaki, Ernest S. [16 ]
Li, Quan-Zhen [17 ]
Luo, Yuling [18 ]
Ma, Yunqing [18 ]
Mei, Nan [1 ]
Peterson, Ron L. [19 ]
Puri, Raj K. [14 ]
Shippy, Richard [20 ]
Su, Zhenqiang [1 ]
Sun, Yongming Andrew [9 ]
Sun, Hongmei [4 ]
Thorn, Brett [4 ]
Turpaz, Yaron [12 ]
Wang, Charles [21 ]
Wang, Sue Jane [5 ]
Warrington, Janet A. [13 ]
Willey, James C. [22 ]
Wu, Jie [4 ]
Xie, Qian [4 ]
Zhang, Liang [23 ]
Zhang, Lu [24 ]
Zhong, Sheng [25 ]
Wolfinger, Russell D. [8 ]
Tong, Weida [1 ]
机构
[1] US FDA, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
[2] Express Anal Inc, Durham, NC 27713 USA
[3] Univ Massachusetts, Dept Phys, Boston, MA 02125 USA
[4] US FDA, Z Tech Corp, NCTR, Jefferson, AR 72079 USA
[5] US FDA, Ctr Drug Evaluat & Res, Silver Spring, MD 20993 USA
[6] Biogen Idec Inc, San Diego, CA 92122 USA
[7] ViaLogy Inc, Altadena, CA 91001 USA
[8] SAS Inst Inc, Cary, NC 27513 USA
[9] Applied Biosyst, Foster City, CA 94404 USA
[10] Eppendorf Array Technol, B-5000 Namur, Belgium
[11] Agilent Technol, Santa Clara, CA 95051 USA
[12] Zhejiang Univ, Pharmaceut Informat Inst, Hangzhou 310027, Peoples R China
[13] Affymetrix Inc, Santa Clara, CA 95051 USA
[14] US FDA, Ctr Biol Evaluat & Res, Bethesda, MD 20892 USA
[15] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
[16] Natl Canc Inst, Adv Technol Ctr, Gaithersburg, MD 20877 USA
[17] Univ Texas SW Med Ctr Dallas, Dallas, TX 75390 USA
[18] Panomics Inc, Fremont, CA 94555 USA
[19] Novartis Inst Biomed Res, Cambridge, MA 02139 USA
[20] GE Healthcare, Tempe, AZ 85284 USA
[21] Univ Calif Los Angeles, David Geffen Sch Med, Cedars Sinai Med Ctr, Los Angeles, CA 90048 USA
[22] Ohio Med Univ, Toledo, OH 43614 USA
[23] CapitalBio Corp, Beijing 102206, Peoples R China
[24] Solexa Inc, Hayward, CA 94545 USA
[25] Univ Illinois, Dept Bioengn, Urbana, IL 61801 USA
关键词
D O I
10.1186/1471-2105-9-S9-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. Results: Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan - the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. Conclusion: We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.
引用
收藏
页数:19
相关论文
共 56 条
  • [1] Microarray data analysis: from disarray to consolidation and consensus
    Allison, DB
    Cui, XQ
    Page, GP
    Sabripour, M
    [J]. NATURE REVIEWS GENETICS, 2006, 7 (01) : 55 - 65
  • [2] A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes
    Baldi, P
    Long, AD
    [J]. BIOINFORMATICS, 2001, 17 (06) : 509 - 519
  • [3] Spotted long oligonucleotide arrays for human gene expression analysis
    Barczak, A
    Rodriguez, MW
    Hanspers, K
    Koth, LL
    Tai, YC
    Bolstad, BM
    Speed, TP
    Erle, DJ
    [J]. GENOME RESEARCH, 2003, 13 (07) : 1775 - 1785
  • [4] Microarray expression profiling identifies genes with altered expression in HDL-deficient mice
    Callow, MJ
    Dudoit, S
    Gong, EL
    Speed, TP
    Rubin, EM
    [J]. GENOME RESEARCH, 2000, 10 (12) : 2022 - 2029
  • [5] Evaluation of DNA microarray results with quantitative gene expression platforms
    Canales, Roger D.
    Luo, Yuling
    Willey, James C.
    Austermiller, Bradley
    Barbacioru, Catalin C.
    Boysen, Cecilie
    Hunkapiller, Kathryn
    Jensen, Roderick V.
    Knight, Charles R.
    Lee, Kathleen Y.
    Ma, Yunqing
    Maqsodi, Botoul
    Papallo, Adam
    Peters, Elizabeth Herness
    Poulter, Karen
    Ruppel, Patricia L.
    Samaha, Raymond R.
    Shi, Leming
    Yang, Wen
    Zhang, Lu
    Goodsaid, Federico M.
    [J]. NATURE BIOTECHNOLOGY, 2006, 24 (09) : 1115 - 1122
  • [6] Selection of differentially expressed genes in microarray data analysis
    Chen, J. J.
    Wang, S-J
    Tsai, C-A
    Lin, C-J
    [J]. PHARMACOGENOMICS JOURNAL, 2007, 7 (03) : 212 - 220
  • [7] Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data
    Chen, James J.
    Hsueh, Huey-Miin
    Delongchamp, Robert R.
    Lin, Chien-Ju
    Tsai, Chen-An
    [J]. BMC BIOINFORMATICS, 2007, 8 (1) : 1 - 14
  • [8] Couzin J, 2006, SCIENCE, V313, P1559
  • [9] Improved statistical tests for differential gene expression by shrinking variance components estimates
    Cui, XG
    Hwang, JTG
    Qiu, J
    Blades, NJ
    Churchill, GA
    [J]. BIOSTATISTICS, 2005, 6 (01) : 59 - 75
  • [10] Outcome signature genes in breast cancer: is there a unique set?
    Ein-Dor, L
    Kela, I
    Getz, G
    Givol, D
    Domany, E
    [J]. BIOINFORMATICS, 2005, 21 (02) : 171 - 178