The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets - improving meta-analysis and prediction of prognosis

被引:111
作者
Sims, Andrew H. [1 ,2 ]
Smethurst, Graeme J. [3 ]
Hey, Yvonne [4 ]
Okoniewski, Michal J. [3 ,5 ]
Pepper, Stuart D. [4 ]
Howell, Anthony [2 ]
Miller, Crispin J. [3 ]
Clarke, Robert B. [2 ]
机构
[1] Western Gen Hosp, Edinburgh Canc Res Ctr, Breakthrough Res Unit, Appl Bioinformat Canc Res Grp, Edinburgh EH4 2XR, Midlothian, Scotland
[2] Univ Manchester, Sch Canc & Imaging Sci, Breast Biol Grp, Manchester M13 9PL, Lancs, England
[3] Paterson Inst Canc Res, Canc Res UK Appl Computat Biol & Bioinformat Gr, Manchester M20 4BX, Lancs, England
[4] Paterson Inst Canc Res, Canc Res UK Affymetrix Serv, Manchester M20 4BX, Lancs, England
[5] UNI ETH Zurich, Funct Genom Ctr, CH-8057 Zurich, Switzerland
关键词
D O I
10.1186/1755-8794-1-42
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses. Results: A series of validation datasets comparing breast cancer and normal breast cell lines (MCF7 and MCF10A) were generated to examine the variability between datasets generated using different amounts of starting RNA, alternative protocols, different generations of Affymetrix GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple batch mean-centering was found to significantly reduce the level of inter-experimental variation, allowing raw transcript levels to be compared across datasets with confidence. By accounting for dataset-specific bias, we were able to assemble the largest gene expression dataset of primary breast tumours to-date (1107), from six previously published studies. Using this meta-dataset, we demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions. However, this is highly dependent upon the composition of the datasets and patient characteristics. Conclusion: Multiplicative, systematic biases are introduced at many stages of microarray experiments. When these are reconciled, raw data can be directly integrated from different gene expression datasets leading to new biological findings with increased statistical power.
引用
收藏
页数:14
相关论文
共 51 条
[1]   RETRACTED: Gene expression signatures, clinicopathological features, and individualized therapy in breast cancer (Retracted Article) [J].
Acharya, Chaitanya R. ;
Hsu, David S. ;
Anders, Carey K. ;
Anguiano, Ariel ;
Salter, Kelly H. ;
Walters, Kelli S. ;
Redman, Richard C. ;
Tuchman, Sascha A. ;
Moylan, Cynthia A. ;
Mukherjee, Sayan ;
Barry, William T. ;
Dressman, Holly K. ;
Ginsburg, Geoffrey S. ;
Marcom, Kelly P. ;
Garman, Katherine S. ;
Lyman, Gary H. ;
Nevins, Joseph R. ;
Potti, Anil .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2008, 299 (13) :1574-1587
[2]   Genetic regulators of large-scale transcriptional signatures in cancer [J].
Adler, AS ;
Lin, MH ;
Horlings, H ;
Nuyten, DSA ;
van de Vijver, MJ ;
Chang, HY .
NATURE GENETICS, 2006, 38 (04) :421-430
[3]   Breast cancer in African-American women: differences in tumor biology from European-American women [J].
Amend, Kandace ;
Hicks, David ;
Ambrosone, Christine B. .
CANCER RESEARCH, 2006, 66 (17) :8327-8330
[4]   Semi-supervised methods to predict patient survival from gene expression data [J].
Bair, E ;
Tibshirani, R .
PLOS BIOLOGY, 2004, 2 (04) :511-522
[5]  
BAIR E, 2004, PREDICTION SUPERVISE
[6]  
Barrett T, 2005, NUCLEIC ACIDS RES, V33, pD562
[7]   An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors [J].
Ben-Porath, Ittai ;
Thomson, Matthew W. ;
Carey, Vincent J. ;
Ge, Ruping ;
Bell, George W. ;
Regev, Aviv ;
Weinberg, Robert A. .
NATURE GENETICS, 2008, 40 (05) :499-507
[8]   Adjustment of systematic microarray data biases [J].
Benito, M ;
Parker, J ;
Du, Q ;
Wu, JY ;
Xang, D ;
Perou, CM ;
Marron, JS .
BIOINFORMATICS, 2004, 20 (01) :105-114
[9]   Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer [J].
Bergamaschi, Anna ;
Kim, Young H. ;
Wang, Pei ;
Sorlie, Therese ;
Hernandez-Boussard, Tina ;
Lonning, Per E. ;
Tibshirani, Robert ;
Borresen-Dale, Anne-Lise ;
Pollack, Jonathan R. .
GENES CHROMOSOMES & CANCER, 2006, 45 (11) :1033-1040
[10]   Data storage and analysis in ArrayExpress [J].
Brazma, Alvis ;
Kapushesky, Misha ;
Parkinson, Helen ;
Sarkans, Ugis ;
Shojatalab, Mohammad .
DNA MICROARRAYS, PART B: DATABASES AND STATISTICS, 2006, 411 :370-+