Statistical issues in the analysis of Illumina data

被引:86
作者
Dunning, Mark J. [1 ]
Barbosa-Morais, Nuno L. [1 ]
Lynch, Andy G. [1 ]
Tavare, Simon [1 ]
Ritchie, Matthew E. [1 ]
机构
[1] Univ Cambridge, Dept Oncol, CRUK Cambridge Res Inst, Li Ka Shing Ctr, Cambridge CB2 0RE, England
基金
英国医学研究理事会;
关键词
D O I
10.1186/1471-2105-9-85
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Illumina bead-based arrays are becoming increasingly popular due to their high degree of replication and reported high data quality. However, little attention has been paid to the pre-processing of Illumina data. In this paper, we present our experience of analysing the raw data from an Illumina spike-in experiment and offer guidelines for those wishing to analyse expression data or develop new methodologies for this technology. Results: We find that the local background estimated by Illumina is consistently low, and subtracting this background is beneficial for detecting differential expression ( DE). Illumina's summary method performs well at removing outliers, producing estimates which are less biased and are less variable than other robust summary methods. However, quality assessment on summarised data may miss spatial artefacts present in the raw data. Also, we find that the background normalisation method used in Illumina's proprietary software ( BeadStudio) can cause problems with a standard DE analysis. We demonstrate that variances calculated from the raw data can be used as inverse weights in the DE analysis to improve power. Finally, variability in both expression levels and DE statistics can be attributed to differences in probe composition. These differences are not accounted for by current analysis methods and require further investigation. Conclusion: Analysing Illumina expression data using BeadStudio is reasonable because of the conservative estimates of summary values produced by the software. Improvements can however be made by not using background normalisation. Access to the raw data allows for a more detailed quality assessment and flexible analyses. In the case of a gene expression study, data can be analysed on an appropriate scale using established tools. Similar improvements can be expected for other Illumina assays.
引用
收藏
页数:15
相关论文
共 19 条
[1]   Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms [J].
Barnes, M ;
Freudenberg, J ;
Thompson, S ;
Aronow, B ;
Pavlidis, P .
NUCLEIC ACIDS RESEARCH, 2005, 33 (18) :5914-5923
[2]   A model of technical variation of microarray signals [J].
Chudin, E. ;
Kruglyak, S. ;
Baker, S. C. ;
Oeser, S. ;
Barker, D. ;
McDaniel, T. K. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (04) :996-1003
[3]   A benchmark for affymetrix GeneChip expression measures [J].
Cope, LM ;
Irizarry, RA ;
Jaffee, HA ;
Wu, ZJ ;
Speed, TP .
BIOINFORMATICS, 2004, 20 (03) :323-331
[4]   Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data [J].
Dai, MH ;
Wang, PL ;
Boyd, AD ;
Kostov, G ;
Athey, B ;
Jones, EG ;
Bunney, WE ;
Myers, RM ;
Speed, TP ;
Akil, H ;
Watson, SJ ;
Meng, F .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :e175.1-e175.9
[5]  
DUNNING MJ, 2006, REVSTAT-STAT J, V4, P1
[6]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[7]   Decoding randomly ordered DNA arrays [J].
Gunderson, KL ;
Kruglyak, S ;
Graige, MS ;
Garcia, F ;
Kermani, BG ;
Zhao, CF ;
Che, DP ;
Dickinson, T ;
Wickham, E ;
Bierle, J ;
Doucet, D ;
Milewski, M ;
Yang, R ;
Siegmund, C ;
Haas, J ;
Zhou, LX ;
Oliphant, A ;
Fan, JB ;
Barnard, S ;
Chee, MS .
GENOME RESEARCH, 2004, 14 (05) :870-877
[8]   Modeling of DNA microarray data by using physical properties of hybridization [J].
Held, GA ;
Grinstein, G ;
Tu, Y .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (13) :7575-7580
[9]   OligoCalc: an online oligonucleotide properties calculator [J].
Kibbe, Warren A. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :W43-W46
[10]   A novel, high-performance random array platform for quantitative gene expression profiling [J].
Kuhn, K ;
Baker, SC ;
Chudin, E ;
Lieu, MH ;
Oeser, S ;
Bennett, H ;
Rigault, P ;
Barker, D ;
McDaniel, TK ;
Chee, MS .
GENOME RESEARCH, 2004, 14 (11) :2347-2356