Risk estimation and risk prediction using machine-learning methods

被引:124
作者
Kruppa, Jochen [1 ]
Ziegler, Andreas [1 ]
Koenig, Inke R. [1 ]
机构
[1] Univ Lubeck, Inst Med Biometrie & Stat, Univ Klinikum Schleswig Holstein, D-23562 Lubeck, Germany
基金
美国国家卫生研究院;
关键词
SINGLE-NUCLEOTIDE POLYMORPHISMS; GENOME-WIDE ASSOCIATION; PROBABILITY ESTIMATION; DISCRIMINANT-ANALYSIS; OPTIMAL NUMBER; SAMPLE-SIZE; CLASSIFICATION; REGRESSION; DIMENSIONALITY; PERFORMANCE;
D O I
10.1007/s00439-012-1194-y
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
After an association between genetic variants and a phenotype has been established, further study goals comprise the classification of patients according to disease risk or the estimation of disease probability. To accomplish this, different statistical methods are required, and specifically machine-learning approaches may offer advantages over classical techniques. In this paper, we describe methods for the construction and evaluation of classification and probability estimation rules. We review the use of machine-learning approaches in this context and explain some of the machine-learning algorithms in detail. Finally, we illustrate the methodology through application to a genome-wide association analysis on rheumatoid arthritis.
引用
收藏
页码:1639 / 1654
页数:16
相关论文
共 101 条
[1]  
Amos Christopher I, 2009, BMC Proc, V3 Suppl 7, pS2
[2]  
ANDERSON JA, 1972, BIOMETRIKA, V59, P19, DOI 10.1093/biomet/59.1.19
[3]  
[Anonymous], 2003, The Statistical Evaluation of Medical Tests for Classification and Prediction
[4]  
Arminger G., 1996, DATA ANAL INFORM SYS, P243, DOI [10.1007/978-3-642-80098-6_21, DOI 10.1007/978-3-642-80098-6_21]
[5]  
Arshadi Niloofar, 2009, BMC Proc, V3 Suppl 7, pS60
[6]   Identifying representative trees from ensembles [J].
Banerjee, Mousumi ;
Ding, Ying ;
Noone, Anne-Michelle .
STATISTICS IN MEDICINE, 2012, 31 (15) :1601-1616
[7]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[8]   A fast algorithm for genome-wide haplotype pattern mining [J].
Besenbacher, Soren ;
Pedersen, Christian N. S. ;
Mailund, Thomas .
BMC BIOINFORMATICS, 2009, 10
[9]   On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification [J].
Biau, Gerard ;
Devroye, Luc .
JOURNAL OF MULTIVARIATE ANALYSIS, 2010, 101 (10) :2499-2518
[10]  
Biau G, 2008, J MACH LEARN RES, V9, P2015