Selecting a single model or combining multiple models for microarray-based classifier development? - A comparative analysis based on large and diverse datasets generated from the MAQC-II project

被引:11
作者
Chen, Minjun [1 ]
Shi, Leming [1 ]
Kelly, Reagan [2 ]
Perkins, Roger [1 ]
Fang, Hong [2 ]
Tong, Weida [1 ]
机构
[1] US FDA, Ctr Bioinformat, Div Syst Biol, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
[2] US FDA, ICF Int, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
来源
BMC BIOINFORMATICS | 2011年 / 12卷
关键词
GENE-EXPRESSION SIGNATURE; BREAST-CANCER; MOLECULAR CLASSIFICATION; CROSS-VALIDATION; PREDICTION; SENSITIVITY; PROFILES; SURVIVAL;
D O I
10.1186/1471-2105-12-S10-S3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Genomic biomarkers play an increasing role in both preclinical and clinical application. Development of genomic biomarkers with microarrays is an area of intensive investigation. However, despite sustained and continuing effort, developing microarray-based predictive models (i.e., genomics biomarkers) capable of reliable prediction for an observed or measured outcome (i.e., endpoint) of unknown samples in preclinical and clinical practice remains a considerable challenge. No straightforward guidelines exist for selecting a single model that will perform best when presented with unknown samples. In the second phase of the MicroArray Quality Control (MAQC-II) project, 36 analysis teams produced a large number of models for 13 preclinical and clinical endpoints. Before external validation was performed, each team nominated one model per endpoint (referred to here as 'nominated models') from which MAQC-II experts selected 13 'candidate models' to represent the best model for each endpoint. Both the nominated and candidate models from MAQC-II provide benchmarks to assess other methodologies for developing microarray-based predictive models. Methods: We developed a simple ensemble method by taking a number of the top performing models from cross-validation and developing an ensemble model for each of the MAQC-II endpoints. We compared the ensemble models with both nominated and candidate models from MAQC-II using blinded external validation. Results: For 10 of the 13 MAQC-II endpoints originally analyzed by the MAQC-II data analysis team from the National Center for Toxicological Research (NCTR), the ensemble models achieved equal or better predictive performance than the NCTR nominated models. Additionally, the ensemble models had performance comparable to the MAQC-II candidate models. Most ensemble models also had better performance than the nominated models generated by five other MAQC-II data analysis teams that analyzed all 13 endpoints. Conclusions: Our findings suggest that an ensemble method can often attain a higher average predictive performance in an external validation set than a corresponding "optimized" model method. Using an ensemble method to determine a final model is a potentially important supplement to the good modeling practices recommended by the MAQC-II project for developing microarray-based genomic biomarkers.
引用
收藏
页数:9
相关论文
共 36 条
[1]  
[Anonymous], 2010, R LANGUAGE ENV STAT
[2]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[3]   COMBINING FORECASTS - A REVIEW AND ANNOTATED-BIBLIOGRAPHY [J].
CLEMEN, RT .
INTERNATIONAL JOURNAL OF FORECASTING, 1989, 5 (04) :559-583
[4]   Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting [J].
Dupuy, Alain ;
Simon, Richard M. .
JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2007, 99 (02) :147-157
[5]   Outcome signature genes in breast cancer: is there a unique set? [J].
Ein-Dor, L ;
Kela, I ;
Getz, G ;
Givol, D ;
Domany, E .
BIOINFORMATICS, 2005, 21 (02) :171-178
[6]   A gene expression biomarker provides early prediction and mechanistic assessment of hepatic tumor induction by nongenotoxic chemicals [J].
Fielden, Mark R. ;
Brennan, Richard ;
Gollub, Jeremy .
TOXICOLOGICAL SCIENCES, 2007, 99 (01) :90-100
[7]   Challenges and limitations of gene expression profiling in mechanistic and predictive toxicology [J].
Fielden, MR ;
Zacharewski, TR .
TOXICOLOGICAL SCIENCES, 2001, 60 (01) :6-10
[8]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[9]   Validated QSAR prediction of OH tropospheric degradation of VOCs: Splitting into training-test sets and consensus modeling [J].
Gramatica, P ;
Pilutti, P ;
Papa, E .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05) :1794-1802
[10]   Statistical external validation and consensus modeling:: A QSPR case study for Koc prediction [J].
Gramatica, Paola ;
Giani, Elisa ;
Papa, Ester .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2007, 25 (06) :755-766