A protocol for building and evaluating predictors of disease state based on microarray data

被引:101
作者
Wessels, LFA
Reinders, MJT
Hart, AAM
Veenman, CJ
Dai, H
He, YD
van't Veer, LJ
机构
[1] Delft Univ Technol, Fac Elect Engn Math & Comp Sci, Dept Mediamat, NL-2628 CD Delft, Netherlands
[2] Netherlands Canc Inst, Dept Pathol, NL-1066 CX Amsterdam, Netherlands
[3] Netherlands Canc Inst, Dept Radiotherapy, NL-1066 CX Amsterdam, Netherlands
[4] Rosetta Inpharmat LLC, Merck & Co Inc, Seattle, WA 98109 USA
关键词
D O I
10.1093/bioinformatics/bti429
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Microarray gene expression data are increasingly employed to identify sets of marker genes that accurately predict disease development and outcome in cancer. Many computational approaches have been proposed to construct such predictors. However, there is, as yet, no objective way to evaluate whether a new approach truly improves on the current state of the art. In addition no 'standard' computational approach has emerged which enables robust outcome prediction. Results: An important contribution of this work is the description of a principled training and validation protocol, which allows objective evaluation of the complete methodology for constructing a predictor. We review the possible choices of computational approaches, with specific emphasis on predictor choice and reporter selection strategies. Employing this training-validation protocol, we evaluated different reporter selection strategies and predictors on six gene expression datasets of varying degrees of difficulty. We demonstrate that simple reporter selection strategies (forward filtering and shrunken centroids) work surprisingly well and outperform partial least squares in four of the six datasets. Similarly, simple predictors, such as the nearest mean classifier, outperform more complex classifiers. Our training-validation protocol provides a robust methodology to evaluate the performance of new computational approaches and to objectively compare outcome predictions on different datasets.
引用
收藏
页码:3755 / 3762
页数:8
相关论文
共 32 条
[21]   Tumor classification by partial least squares using microarray gene expression data [J].
Nguyen, DV ;
Rocke, DM .
BIOINFORMATICS, 2002, 18 (01) :39-50
[22]   Prediction of central nervous system embryonal tumour outcome based on gene expression [J].
Pomeroy, SL ;
Tamayo, P ;
Gaasenbeek, M ;
Sturla, LM ;
Angelo, M ;
McLaughlin, ME ;
Kim, JYH ;
Goumnerova, LC ;
Black, PM ;
Lau, C ;
Allen, JC ;
Zagzag, D ;
Olson, JM ;
Curran, T ;
Wetmore, C ;
Biegel, JA ;
Poggio, T ;
Mukherjee, S ;
Rifkin, R ;
Califano, A ;
Stolovitzky, G ;
Louis, DN ;
Mesirov, JP ;
Lander, ES ;
Golub, TR .
NATURE, 2002, 415 (6870) :436-442
[23]   Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles [J].
Roberts, CJ ;
Nelson, B ;
Marton, MJ ;
Stoughton, R ;
Meyer, MR ;
Bennett, HA ;
He, YDD ;
Dai, HY ;
Walker, WL ;
Hughes, TR ;
Tyers, M ;
Boone, C ;
Friend, SH .
SCIENCE, 2000, 287 (5454) :873-880
[24]   Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification [J].
Romualdi, C ;
Campanaro, S ;
Campagna, D ;
Celegato, B ;
Cannata, N ;
Toppo, S ;
Valle, G ;
Lanfranchi, G .
HUMAN MOLECULAR GENETICS, 2003, 12 (08) :823-836
[25]   Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification [J].
Simon, R ;
Radmacher, MD ;
Dobbin, K ;
McShane, LM .
JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2003, 95 (01) :14-18
[26]   Gene expression correlates of clinical prostate cancer behavior [J].
Singh, D ;
Febbo, PG ;
Ross, K ;
Jackson, DG ;
Manola, J ;
Ladd, C ;
Tamayo, P ;
Renshaw, AA ;
D'Amico, AV ;
Richie, JP ;
Lander, ES ;
Loda, M ;
Kantoff, PW ;
Golub, TR ;
Sellers, WR .
CANCER CELL, 2002, 1 (02) :203-209
[27]   Diagnosis of multiple cancer types by shrunken centroids of gene expression [J].
Tibshirani, R ;
Hastie, T ;
Narasimhan, B ;
Chu, G .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6567-6572
[28]   A gene-expression signature as a predictor of survival in breast cancer. [J].
van de Vijver, MJ ;
He, YD ;
van 't Veer, LJ ;
Dai, H ;
Hart, AAM ;
Voskuil, DW ;
Schreiber, GJ ;
Peterse, JL ;
Roberts, C ;
Marton, MJ ;
Parrish, M ;
Atsma, D ;
Witteveen, A ;
Glas, A ;
Delahaye, L ;
van der Velde, T ;
Bartelink, H ;
Rodenhuis, S ;
Rutgers, ET ;
Friend, SH ;
Bernards, R .
NEW ENGLAND JOURNAL OF MEDICINE, 2002, 347 (25) :1999-2009
[29]   Gene expression profiling predicts clinical outcome of breast cancer [J].
van't Veer, LJ ;
Dai, HY ;
van de Vijver, MJ ;
He, YDD ;
Hart, AAM ;
Mao, M ;
Peterse, HL ;
van der Kooy, K ;
Marton, MJ ;
Witteveen, AT ;
Schreiber, GJ ;
Kerkhoven, RM ;
Roberts, C ;
Linsley, PS ;
Bernards, R ;
Friend, SH .
NATURE, 2002, 415 (6871) :530-536
[30]  
VAPNIK VN, 1999, STAT LEARNING THEORY