A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments

被引:366
作者
Pan, W [1 ]
机构
[1] Univ Minnesota, Sch Publ Hlth, Div Biostat, Minneapolis, MN 55455 USA
关键词
D O I
10.1093/bioinformatics/18.4.546
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A common task in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. Recently several statistical methods have been proposed to accomplish this goal when there are replicated samples under each condition. However, it may not be clear how these methods compare with each other. Our main goal here is to compare three methods, the t-test, a regression modeling approach (Thomas et at., Genome Res., 11, 1227-1236, 2001) and a mixture model approach (Pan et al., http://www.biostat.umn.edu/cgi-bin/rrs?print+2001, 2001a,b) with particular attention to their different modeling assumptions. Results: It is pointed out that all the three methods are based on using the two-sample t-statistic or its minor variation, but they differ in how to associate a statistical significance level to the corresponding statistic, leading to possibly large difference in the resulting significance levels and the numbers of genes detected. In particular, we give an explicit formula for the test statistic used in the regression approach. Using the leukemia data of Golub et al. (Science, 285, 531-537, 1999), we illustrate these points. We also briefly compare the results with those of several other methods, including the empirical Bayesian method of Efron et at. (J. Am. Stat. Assoc., to appear, 2001) and the Significance Analysis of Microarray (SAM) method of Tusher et at. (Proc. Natl Acad. Sci. USA, 98, 5116-5121,2001). Contact: weip@biostat.umn.edu.
引用
收藏
页码:546 / 554
页数:9
相关论文
共 31 条
  • [1] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [2] WELCH APPROXIMATE SOLUTION FOR THE BEHRENS-FISHER PROBLEM
    BEST, DJ
    RAYNER, JCW
    [J]. TECHNOMETRICS, 1987, 29 (02) : 205 - 210
  • [3] Exploring the new world of the genome with DNA microarrays
    Brown, PO
    Botstein, D
    [J]. NATURE GENETICS, 1999, 21 (Suppl 1) : 33 - 37
  • [4] Chen Y, 1997, J Biomed Opt, V2, P364, DOI 10.1117/12.281504
  • [5] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [6] Devore J., 1997, Statistics: the exploration and analysis of data
  • [7] REGRESSION-MODELS FOR DISCRETE LONGITUDINAL RESPONSES - COMMENT AND REJOINDER
    DRUM, M
    MCCULLAGH, P
    PRENTICE, RL
    MANCL, LA
    ZEGER, S
    LIANG, KY
    HEAGERTY, P
    FITZMAURICE, G
    LAIRD, NM
    ROTNITSKY, AG
    [J]. STATISTICAL SCIENCE, 1993, 8 (03) : 300 - 309
  • [8] DUOIT S, 2000, STAT METHODS IDENTIF
  • [9] EFRON B, 2001, IN PRESS J AM STAT A
  • [10] EFRON B, 2000, UNPUB MICROARRAYS TH