Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset

被引:289
作者
Choe, SE
Boutros, M
Michelson, AM
Church, GM
Halfon, MS
机构
[1] Brigham & Womens Hosp, Dept Med, Div Genet, Boston, MA 02115 USA
[2] Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
[3] Brigham & Womens Hosp, Howard Hughes Med Inst, Boston, MA 02115 USA
[4] SUNY Buffalo, Dept Biochem, Buffalo, NY 14214 USA
[5] SUNY Buffalo, Ctr Excellence Bioinformat, Buffalo, NY 14214 USA
[6] German Canc Res Ctr, DKFZ B110, D-69120 Heidelberg, Germany
关键词
D O I
10.1186/gb-2005-6-2-r16
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: As more methods are developed to analyze RNA-profiling data, assessing their performance using control datasets becomes increasingly important. Results: We present a 'spike-in' experiment for Affymetrix GeneChips that provides a defined dataset of 3,860 RNA species, which we use to evaluate analysis options for identifying differentially expressed genes. The experimental design incorporates two novel features. First, to obtain accurate estimates of false-positive and false-negative rates, 100-200 RNAs are spiked in at each fold-change level of interest, ranging from 1.2 to 4-fold. Second, instead of using an uncharacterized background RNA sample, a set of 2,551 RNA species is used as the constant (1x) set, allowing us to know whether any given probe set is truly present or absent. Application of a large number of analysis methods to this dataset reveals clear variation in their ability to identify differentially expressed genes. False-negative and false-positive rates are minimized when the following options are chosen: subtracting nonspecific signal from the PM probe intensities; performing an intensity-dependent normalization at the probe set level; and incorporating a signal intensity-dependent standard deviation in the test statistic. Conclusions: A best-route combination of analysis methods is presented that allows detection of approximately 70% of true positives before reaching a 10% false-discovery rate. We highlight areas in need of improvement, including better estimate of false-discovery rates and decreased false-negative rates.
引用
收藏
页数:16
相关论文
共 18 条
  • [1] A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes
    Baldi, P
    Long, AD
    [J]. BIOINFORMATICS, 2001, 17 (06) : 509 - 519
  • [2] BARASH Y, 2004, BIOINFORMATICS ADV A, V1, P1
  • [3] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
    Bolstad, BM
    Irizarry, RA
    Åstrand, M
    Speed, TP
    [J]. BIOINFORMATICS, 2003, 19 (02) : 185 - 193
  • [4] Statistical methods for ranking differentially expressed genes
    Broberg, P
    [J]. GENOME BIOLOGY, 2003, 4 (06)
  • [5] Chudin E, 2002, GENOME BIOL, V3
  • [6] Microarray standard data set and figures of merit for comparing data processing methods and experiment designs
    He, YDD
    Dai, HY
    Schadt, EE
    Cavet, G
    Edwards, SW
    Stepaniants, SB
    Duenwald, S
    Kleinhanz, R
    Jones, AR
    Shoemaker, DD
    Stoughton, RB
    [J]. BIOINFORMATICS, 2003, 19 (08) : 956 - 965
  • [7] Exploration, normalization, and summaries of high density oligonucleotide array probe level data
    Irizarry, RA
    Hobbs, B
    Collin, F
    Beazer-Barclay, YD
    Antonellis, KJ
    Scherf, U
    Speed, TP
    [J]. BIOSTATISTICS, 2003, 4 (02) : 249 - 264
  • [8] Summaries of affymetrix GeneChip probe level data
    Irizarry, RA
    Bolstad, BM
    Collin, F
    Cope, LM
    Hobbs, B
    Speed, TP
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (04) : e15
  • [9] A high performance test of differential gene expression for oligonucleotide arrays
    Lemon, WJ
    Liyanarachchi, S
    You, M
    [J]. GENOME BIOLOGY, 2003, 4 (10)
  • [10] Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection
    Li, C
    Wong, WH
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (01) : 31 - 36