An empirical assessment of validation practices for molecular classifiers

被引:68
作者
Castaldi, Peter J. [1 ]
Dahabreh, Issa J. [1 ]
Ioannidis, John P. A. [1 ]
机构
[1] Stanford Univ, Sch Med, Stanford Prevent Res Ctr, Fac Med Sch, Stanford, CA 94305 USA
基金
美国国家卫生研究院;
关键词
predictive medicine; genes; gene expression; proteomics; STAGE OVARIAN-CANCER; GENE-EXPRESSION DATA; BREAST-CANCER; CELL-CARCINOMA; PUBLISHED MICROARRAY; DIAGNOSTIC-TESTS; CROSS-VALIDATION; STATISTICS NOTES; ERROR ESTIMATION; META-REGRESSION;
D O I
10.1093/bib/bbq073
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Proposed molecular classifiers may be overfit to idiosyncrasies of noisy genomic and proteomic data. Cross-validation methods are often used to obtain estimates of classification accuracy, but both simulations and case studies suggest that, when inappropriate methods are used, bias may ensue. Bias can be bypassed and generalizability can be tested by external (independent) validation. We evaluated 35 studies that have reported on external validation of a molecular classifier. We extracted information on study design and methodological features, and compared the performance of molecular classifiers in internal cross-validation versus external validation for 28 studies where both had been performed. We demonstrate that the majority of studies pursued cross-validation practices that are likely to overestimate classifier performance. Most studies were markedly underpowered to detect a 20% decrease in sensitivity or specificity between internal cross-validation and external validation [median power was 36% (IQR, 21-61%) and 29% (IQR, 15-65%), respectively]. The median reported classification performance for sensitivity and specificity was 94% and 98%, respectively, in cross-validation and 88% and 81% for independent validation. The relative diagnostic odds ratio was 3.26 (95% CI 2.04-5.21) for cross-validation versus independent validation. Finally, we reviewed all studies (n=758) which cited those in our study sample, and identified only one instance of additional subsequent independent validation of these classifiers. In conclusion, these results document that many cross-validation practices employed in the literature are potentially biased and genuine progress in this field will require adoption of routine external validation of molecular classifiers, preferably in much larger studies than in current practice.
引用
收藏
页码:189 / 202
页数:14
相关论文
共 84 条
  • [1] Serum proteome profiling detects myelodysplastic syndromes and identifies CXC chemokine ligands 4 and 7 as markers for advanced disease
    Aivado, Manuel
    Spentzos, Dimitrios
    Germing, Ulrich
    Alterovitz, Gil
    Meng, Xiao-Ying
    Grall, Franck
    Giagounidis, Aristoteles A. N.
    Klement, Giannoula
    Steidl, Ulrich
    Otu, Hasan H.
    Czibere, Akos
    Prall, Wolf C.
    Iking-Konert, Christof
    Shayne, Michelle
    Ramoni, Marco F.
    Gattermann, Norbert
    Haas, Rainer
    Mitsiades, Constantine S.
    Fung, Eric T.
    Libermann, Towia A.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (04) : 1307 - 1312
  • [2] Statistics Notes - Interaction revisited: the difference between two estimates
    Altman, DG
    Bland, JM
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2003, 326 (7382): : 219 - 219
  • [3] Selection bias in gene extraction on the basis of microarray gene-expression data
    Ambroise, C
    McLachlan, GJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) : 6562 - 6566
  • [4] Exonic expression profiling of breast cancer and benign lesions: a retrospective analysis
    Andre, Fabrice
    Michiels, Stefan
    Dessen, Philippe
    Scott, Veronique
    Suciu, Voichita
    Uzan, Catherine
    Lazar, Vladimir
    Lacroix, Ludovic
    Vassal, Gilles
    Spielmann, Marc
    Vielh, Philippe
    Delaloge, Suzette
    [J]. LANCET ONCOLOGY, 2009, 10 (04) : 381 - 390
  • [5] [Anonymous], 2010, R LANG ENV STAT COMP
  • [6] Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer
    Ayers, M
    Symmans, WF
    Stec, J
    Damokosh, AI
    Clark, E
    Hess, K
    Lecocke, M
    Metivier, J
    Booser, D
    Ibrahim, N
    Valero, V
    Royce, M
    Arun, B
    Whitman, G
    Ross, J
    Sneige, N
    Hortobagyi, GN
    Pusztai, L
    [J]. JOURNAL OF CLINICAL ONCOLOGY, 2004, 22 (12) : 2284 - 2293
  • [7] Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments
    Baggerly, KA
    Morris, JS
    Coombes, KR
    [J]. BIOINFORMATICS, 2004, 20 (05) : 777 - U710
  • [8] Statistics notes - The odds ratio
    Bland, JM
    Altman, DG
    [J]. BRITISH MEDICAL JOURNAL, 2000, 320 (7247) : 1468 - 1468
  • [9] Cytosolic N-terminal arginine-based signals together with a luminal signal target a type II membrane protein to the plant ER
    Boulaflous, Aurelia
    Saint-Jore-Dupas, Claude
    Herranz-Gordo, Marie-Carmen
    Pagny-Salehabadi, Sophie
    Plasson, Carole
    Garidou, Frederic
    Kiefer-Meyer, Marie-Christine
    Ritzenthaler, Christophe
    Faye, Loic
    Gomord, Veronique
    [J]. BMC PLANT BIOLOGY, 2009, 9
  • [10] Is cross-validation valid for small-sample microarray classification?
    Braga-Neto, UM
    Dougherty, ER
    [J]. BIOINFORMATICS, 2004, 20 (03) : 374 - 380