Adjustment of systematic microarray data biases

被引:297
作者
Benito, M
Parker, J
Du, Q
Wu, JY
Xang, D
Perou, CM [1 ]
Marron, JS
机构
[1] Univ N Carolina, Lineberger Comprehens Canc Ctr, Chapel Hill, NC 27599 USA
[2] Univ N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
[3] Univ N Carolina, Dept Pathol & Lab Med, Chapel Hill, NC 27599 USA
[4] Univ Carlos III Madrid, Dept Stat & Econometr, Madrid, Spain
[5] Karolinska Inst, Dept Mol Med, S-17176 Stockholm, Sweden
[6] Univ N Carolina, Dept Stat, Chapel Hill, NC 27599 USA
关键词
D O I
10.1093/bioinformatics/btg385
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Systematic differences due to experimental features of microarray experiments are present in most large microarray data sets. Many different experimental features can cause biases including different sources of RNA, different production lots of microarrays or different microarray platforms. These systematic effects present a substantial hurdle to the analysis of microarray data. Results: We present here a new method for the identification and adjustment of systematic biases that are present within microarray data sets. Our approach is based on modern statistical discrimination methods and is shown to be very effective in removing systematic biases present in a previously published breast tumor cDNA microarray data set. The new method of 'Distance Weighted Discrimination (DWD)' is shown to be better than Support Vector Machines and Singular Value Decomposition for the adjustment of systematic microarray effects. In addition, it is shown to be of general use as a tool for the discrimination of systematic problems present in microarray data sets, including the merging of two breast tumor data sets completed on different microarray platforms.
引用
收藏
页码:105 / 114
页数:10
相关论文
共 17 条
  • [1] Singular value decomposition for genome-wide expression data processing and modeling
    Alter, O
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) : 10101 - 10106
  • [2] [Anonymous], 1982, ESTIMATION DEPENDENC
  • [3] A tutorial on Support Vector Machines for pattern recognition
    Burges, CJC
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) : 121 - 167
  • [4] Cristianini N., 2000, Intelligent Data Analysis: An Introduction, DOI 10.1017/CBO9780511801389
  • [5] Eisen MB, 1999, METHOD ENZYMOL, V303, P179
  • [6] The Stanford Microarray Database: data access and quality assessment tools
    Gollub, J
    Ball, CA
    Binkley, G
    Demeter, J
    Finkelstein, DB
    Hebert, JM
    Hernandez-Boussard, T
    Jin, H
    Kaloper, M
    Matese, JC
    Schroeder, M
    Brown, PO
    Botstein, D
    Sherlock, G
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 94 - 96
  • [7] MARRON J, 2002, DISTANCE WEIGHTED DI
  • [8] Molecular characterisation of soft tissue tumours: a gene expression study
    Nielsen, TO
    West, RB
    Linn, SC
    Alter, O
    Knowling, MA
    O'Connell, JX
    Zhu, S
    Fero, M
    Sherlock, G
    Pollack, JR
    Brown, PO
    Botstein, D
    van de Rijn, M
    [J]. LANCET, 2002, 359 (9314) : 1301 - 1307
  • [9] Molecular portraits of human breast tumours
    Perou, CM
    Sorlie, T
    Eisen, MB
    van de Rijn, M
    Jeffrey, SS
    Rees, CA
    Pollack, JR
    Ross, DT
    Johnsen, H
    Akslen, LA
    Fluge, O
    Pergamenschikov, A
    Williams, C
    Zhu, SX
    Lonning, PE
    Borresen-Dale, AL
    Brown, PO
    Botstein, D
    [J]. NATURE, 2000, 406 (6797) : 747 - 752
  • [10] Repeated observation of breast tumor subtypes in independent gene expression data sets
    Sorlie, T
    Tibshirani, R
    Parker, J
    Hastie, T
    Marron, JS
    Nobel, A
    Deng, S
    Johnsen, H
    Pesich, R
    Geisler, S
    Demeter, J
    Perou, CM
    Lonning, PE
    Brown, PO
    Borresen-Dale, AL
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (14) : 8418 - 8423