MDQC: a new quality assessment method for microarrays based on quality control reports

被引:27
作者
Freue, Gabriela V. Cohen [1 ]
Hollander, Zsuzsanna
Shen, Enqing
Zamar, Ruben H.
Balshaw, Robert
Scherer, Andreas
McManus, Bruce
Keown, Paul
McMaster, W. Robert
Ng, Raymond T.
机构
[1] Univ British Columbia, Dept Comp Sci, Vancouver, BC V5Z 1M9, Canada
[2] Univ British Columbia, Dept Stat, Vancouver, BC V5Z 1M9, Canada
[3] Univ British Columbia, Dept Pathol & Lab Med, Vancouver, BC V5Z 1M9, Canada
[4] Univ British Columbia, Dept Med, Vancouver, BC V5Z 1M9, Canada
[5] Univ British Columbia, Dept Med Genet, Vancouver, BC V5Z 1M9, Canada
[6] Univ British Columbia, James Hogg iCAPTURE Ctr, Vancouver, BC V5Z 1M9, Canada
[7] Novartis Pharma AG, Basel, Switzerland
[8] Vancouver Coastal Hlth Res Inst, Vancouver, BC, Canada
关键词
D O I
10.1093/bioinformatics/btm487
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The process of producing microarray data involves multiple steps, some of which may suffer from technical problems and seriously damage the quality of the data. Thus, it is essential to identify those arrays with low quality. This article addresses two questions: (1) how to assess the quality of a microarray dataset using the measures provided in quality control (QC) reports; (2) how to identify possible sources of the quality problems. Results: We propose a novel multivariate approach to evaluate the quality of an array that examines the Mahalanobis distance of its quality attributes from those of other arrays. Thus, we call it Mahalanobis Distance Quality Control (MDQC) and examine different approaches of this method. MDQC flags problematic arrays based on the idea of outlier detection, i.e. it flags those arrays whose quality attributes jointly depart from those of the bulk of the data. Using two case studies, we show that a multivariate analysis gives substantially richer information than analyzing each parameter of the QC report in isolation. Moreover, once the QC report is produced, our quality assessment method is computationally inexpensive and the results can be easily visualized and interpreted. Finally, we show that computing these distances on subsets of the quality measures in the report may increase the methods ability to detect unusual arrays and helps to identify possible reasons of the quality problems.
引用
收藏
页码:3162 / 3169
页数:8
相关论文
共 20 条
  • [1] AFFYMETRIX, 2004, GENECHIP EXPRESSION
  • [2] AFFYMETRIX, 2005, GENECHIP OPERATING S
  • [3] [Anonymous], 1987, ROBUST REGRESSION OU
  • [4] [Anonymous], [No title captured]
  • [5] Brettschneider J, 2007, QUALITY ASSESSMENT S
  • [6] MASQOT:: a method for cDNA microarray spot quality control -: art. no. 250
    Bylesjö, M
    Eriksson, D
    Sjödin, A
    Sjöström, M
    Jansson, S
    Antti, H
    Trygg, J
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [7] Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies
    Croux, C
    Haesbroeck, G
    [J]. BIOMETRIKA, 2000, 87 (03) : 603 - 618
  • [8] Finkelstein David B, 2005, J Biomol Tech, V16, P143
  • [9] A novel strategy for microarray quality control using Bayesian networks
    Hautaniemi, S
    Edgren, H
    Vesanen, P
    Wolf, M
    Järvinen, AK
    Yli-Harja, O
    Astola, J
    Kallioniemi, O
    Monni, O
    [J]. BIOINFORMATICS, 2003, 19 (16) : 2031 - 2038
  • [10] A fast method for robust principal components with applications to chemometrics
    Hubert, M
    Rousseeuw, PJ
    Verboven, S
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2002, 60 (1-2) : 101 - 111