BDVal: reproducible large-scale predictive model development and validation in high-throughput datasets

被引:4
作者
Dorff, Kevin C. [1 ]
Chambwe, Nyasha [1 ,2 ]
Srdanovic, Marko [1 ]
Campagne, Fabien [1 ,2 ]
机构
[1] Cornell Univ, Weill Med Coll, Dept Physiol & Biophys, New York, NY 10021 USA
[2] Cornell Univ, Weill Med Coll, Inst Computat Biomed, New York, NY 10021 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btq463
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput data can be used in conjunction with clinical information to develop predictive models. Automating the process of developing, evaluating and testing such predictive models on different datasets would minimize operator errors and facilitate the comparison of different modeling approaches on the same dataset. Complete automation would also yield unambiguous documentation of the process followed to develop each model. We present the BDVal suite of programs that fully automate the construction of predictive classification models from high-throughput data and generate detailed reports about the model construction process. We have used BDVal to construct models from microarray and proteomics data, as well as from DNA-methylation datasets. The programs are designed for scalability and support the construction of thousands of alternative models from a given dataset and prediction task.
引用
收藏
页码:2472 / 2473
页数:2
相关论文
共 8 条
  • [1] Gene selection for cancer classification using support vector machines
    Guyon, I
    Weston, J
    Barnhill, S
    Vapnik, V
    [J]. MACHINE LEARNING, 2002, 46 (1-3) : 389 - 422
  • [2] Microarray based diagnosis profits from better documentation of gene expression signatures
    Kostka, Dennis
    Spang, Rainer
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (02)
  • [3] Quackenbush J., 2004, PRECLINICA, V2, P313
  • [4] The MicroArray Quality Control (MAQC)-IIII study of common practices for the development and validation of microarray-based predictive models
    Shi, Leming
    Campbell, Gregory
    Jones, Wendell D.
    Campagne, Fabien
    Wen, Zhining
    Walker, Stephen J.
    Su, Zhenqiang
    Chu, Tzu-Ming
    Goodsaid, Federico M.
    Pusztai, Lajos
    Shaughnessy, John D., Jr.
    Oberthuer, Andre
    Thomas, Russell S.
    Paules, Richard S.
    Fielden, Mark
    Barlogie, Bart
    Chen, Weijie
    Du, Pan
    Fischer, Matthias
    Furlanello, Cesare
    Gallas, Brandon D.
    Ge, Xijin
    Megherbi, Dalila B.
    Symmans, W. Fraser
    Wang, May D.
    Zhang, John
    Bitter, Hans
    Brors, Benedikt
    Bushel, Pierre R.
    Bylesjo, Max
    Chen, Minjun
    Cheng, Jie
    Cheng, Jing
    Chou, Jeff
    Davison, Timothy S.
    Delorenzi, Mauro
    Deng, Youping
    Devanarayan, Viswanath
    Dix, David J.
    Dopazo, Joaquin
    Dorff, Kevin C.
    Elloumi, Fathi
    Fan, Jianqing
    Fan, Shicai
    Fan, Xiaohui
    Fang, Hong
    Gonzaludo, Nina
    Hess, Kenneth R.
    Hong, Huixiao
    Huan, Jun
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (08) : 827 - U109
  • [5] CROSS-VALIDATORY CHOICE AND ASSESSMENT OF STATISTICAL PREDICTIONS
    STONE, M
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1974, 36 (02) : 111 - 147
  • [6] A BIAS CORRECTION FOR THE MINIMUM ERROR RATE IN CROSS-VALIDATION
    Tibshirani, Ryan J.
    Tibshirani, Robert
    [J]. ANNALS OF APPLIED STATISTICS, 2009, 3 (02) : 822 - 829
  • [7] Bias in error estimation when using cross-validation for model selection
    Varma, S
    Simon, R
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [8] Biomarker identification by feature wrappers
    Xiong, MM
    Fang, XZ
    Zhao, JY
    [J]. GENOME RESEARCH, 2001, 11 (11) : 1878 - 1887