Correlation test to assess low-level processing of high-density oligonucleotide microarray data

被引:39
作者
Ploner, A [1 ]
Miller, LD
Hall, P
Bergh, J
Pawitan, Y
机构
[1] Karolinska Inst, Stockholm, Sweden
[2] Genome Inst Singapore, Singapore, Singapore
[3] Karolinska Inst & Univ Hosp, Radiumhemmet, Canc Ctr Karolinska, Dept Pathol & Oncol, Stockholm, Sweden
关键词
D O I
10.1186/1471-2105-6-80
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: There are currently a number of competing techniques for low-level processing of oligonucleotide array data. The choice of technique has a profound effect on subsequent statistical analyses, but there is no method to assess whether a particular technique is appropriate for a specific data set, without reference to external data. Results: We analyzed coregulation between genes in order to detect insufficient normalization between arrays, where coregulation is measured in terms of statistical correlation. In a large collection of genes, a random pair of genes should have on average zero correlation, hence allowing a correlation test. For all data sets that we evaluated, and the three most commonly used low-level processing procedures including MAS5, RMA and MBEI, the housekeeping-gene normalization failed the test. For a real clinical data set, RMA and MBEI showed significant correlation for absent genes. We also found that a second round of normalization on the probe set level improved normalization significantly throughout. Conclusion: Previous evaluation of low-level processing in the literature has been limited to artificial spike-in and mixture data sets. In the absence of a known gold-standard, the correlation criterion allows us to assess the appropriateness of low-level processing of a specific data set and the success of normalization for subsets of genes.
引用
收藏
页数:20
相关论文
共 17 条
[1]  
*ACM SIGKDD, 2003, SIGKDD EXPL, V5
[2]  
[Anonymous], LANG ENV STAT COMP
[3]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[4]   Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset [J].
Choe, SE ;
Boutros, M ;
Michelson, AM ;
Church, GM ;
Halfon, MS .
GENOME BIOLOGY, 2005, 6 (02)
[5]   A benchmark for affymetrix GeneChip expression measures [J].
Cope, LM ;
Irizarry, RA ;
Jaffee, HA ;
Wu, ZJ ;
Speed, TP .
BIOINFORMATICS, 2004, 20 (03) :323-331
[6]   Comparisons and validation of statistical clustering techniques for microarray gene expression data [J].
Datta, S ;
Datta, S .
BIOINFORMATICS, 2003, 19 (04) :459-466
[7]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[8]  
Hoffmann R, 2002, GENOME BIOL, V3
[9]   Exploration, normalization, and summaries of high density oligonucleotide array probe level data [J].
Irizarry, RA ;
Hobbs, B ;
Collin, F ;
Beazer-Barclay, YD ;
Antonellis, KJ ;
Scherf, U ;
Speed, TP .
BIOSTATISTICS, 2003, 4 (02) :249-264
[10]   A high performance test of differential gene expression for oligonucleotide arrays [J].
Lemon, WJ ;
Liyanarachchi, S ;
You, M .
GENOME BIOLOGY, 2003, 4 (10)