Balancing the Robustness and Predictive Performance of Biomarkers

被引:11
作者
Kirk, Paul [1 ]
Witkover, Aviva [2 ]
Bangham, Charles R. M. [2 ]
Richardson, Sylvia [5 ]
Lewin, Alexandra M. [3 ]
Stumpf, Michael P. H. [4 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Div Mol Biosci, London SW7 2AZ, England
[2] Univ London Imperial Coll Sci Technol & Med, Dept Immunol, London SW7 2AZ, England
[3] Univ London Imperial Coll Sci Technol & Med, Dept Epidemiol & Publ Hlth, London SW7 2AZ, England
[4] Univ London Imperial Coll Sci Technol & Med, Ctr Bioinformat, London SW7 2AZ, England
[5] MRC Biostat Unit, Cambridge, England
基金
英国生物技术与生命科学研究理事会; 英国惠康基金;
关键词
machine learning; proteins; reverse engineering; statistical models; statistics; SELECTION; STABILITY; REGULARIZATION; CANCER;
D O I
10.1089/cmb.2013.0018
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Recent studies have highlighted the importance of assessing the robustness of putative biomarkers identified from experimental data. This has given rise to the concept of stable biomarkers, which are ones that are consistently identified regardless of small perturbations to the data. Since stability is not by itself a useful objective, we present a number of strategies that combine assessments of stability and predictive performance in order to identify biomarkers that are both robust and diagnostically useful. Moreover, by wrapping these strategies around logistic regression classifiers regularized by the elastic net penalty, we are able to assess the effects of correlations between biomarkers upon their perceived stability. We use a synthetic example to illustrate the properties of our proposed strategies. In this example, we find that: (i) assessments of stability can help to reduce the number of false-positive biomarkers, although potentially at the cost of missing some true positives; (ii) combining assessments of stability with assessments of predictive performance can improve the true positive rate; and (iii) correlations between biomarkers can have adverse effects on their stability and hence must be carefully taken into account when undertaking biomarker discovery. We then apply our strategies in a proteomics context to identify a number of robust candidate biomarkers for the human disease HTLV1-associated myelopathy/tropical spastic paraparesis (HAM/TSP).
引用
收藏
页码:979 / 989
页数:11
相关论文
共 25 条
[1]
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J].
Abeel, Thomas ;
Helleputte, Thibault ;
Van de Peer, Yves ;
Dupont, Pierre ;
Saeys, Yvan .
BIOINFORMATICS, 2010, 26 (03) :392-398
[2]
False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies [J].
Ahmed, Ismail ;
Hartikainen, Anna-Liisa ;
Jarvelin, Marjo-Riitta ;
Richardson, Sylvia .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
[3]
Stability Selection for Genome-Wide Association [J].
Alexander, David H. ;
Lange, Kenneth .
GENETIC EPIDEMIOLOGY, 2011, 35 (07) :722-728
[4]
[Anonymous], 2008, P 14 ACM SIGKDD INT
[5]
The immune response to HTLV-1 [J].
Bangham, CRM .
CURRENT OPINION IN IMMUNOLOGY, 2000, 12 (04) :397-402
[6]
HTLV-1 infections [J].
Bangham, CRM .
JOURNAL OF CLINICAL PATHOLOGY, 2000, 53 (08) :581-586
[7]
Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[8]
Outcome signature genes in breast cancer: is there a unique set? [J].
Ein-Dor, L ;
Kela, I ;
Getz, G ;
Givol, D ;
Domany, E .
BIOINFORMATICS, 2005, 21 (02) :171-178
[9]
Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer [J].
Ein-Dor, L ;
Zuk, O ;
Domany, E .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (15) :5923-5928
[10]
PATHWISE COORDINATE OPTIMIZATION [J].
Friedman, Jerome ;
Hastie, Trevor ;
Hoefling, Holger ;
Tibshirani, Robert .
ANNALS OF APPLIED STATISTICS, 2007, 1 (02) :302-332