A random forest classifier for lymph diseases

被引:150
作者
Azar, Ahmad Taher [1 ]
Elshazly, Hanaa Ismail [2 ,3 ]
Hassanien, Aboul Ella [2 ,3 ]
Elkorany, Abeer Mohamed [2 ]
机构
[1] Benha Univ, Fac Comp & Informat, Cairo, Egypt
[2] Cairo Univ, Fac Comp & Informat, Giza, Egypt
[3] Sci Res Grp Egypt, Cairo, Egypt
关键词
Machine learning (ML); Feature selection (FS); Genetic algorithm (GA); Random forest classifier (RFC); Lymph diseases; FLOATING SEARCH METHODS; FEATURE-SELECTION; RULE EXTRACTION; DIMENSIONALITY; LYMPHOGRAPHY; RECOGNITION; ALGORITHMS; DIAGNOSIS; SYSTEM;
D O I
10.1016/j.cmpb.2013.11.004
中图分类号
TP39 [计算机的应用];
学科分类号
080201 [机械制造及其自动化];
摘要
Machine learning-based classification techniques provide support for the decision-making process in many areas of health care, including diagnosis, prognosis, screening, etc. Feature selection (FS) is expected to improve classification performance, particularly in situations characterized by the high data dimensionality problem caused by relatively few training examples compared to a large number of measured features. In this paper, a random forest classifier (RFC) approach is proposed to diagnose lymph diseases. Focusing on feature selection, the first stage of the proposed system aims at constructing diverse feature selection algorithms such as genetic algorithm (GA), Principal Component Analysis (PCA), Relief-F, Fisher, Sequential Forward Floating Search (SFFS) and the Sequential Backward Floating Search (SBFS) for reducing the dimension of lymph diseases dataset. Switching from feature selection to model construction, in the second stage, the obtained feature subsets are fed into the RFC for efficient classification. It was observed that GA-RFC achieved the highest classification accuracy of 92.2%. The dimension of input feature space is reduced from eighteen to six features by using GA. (C) 2013 Elsevier Ireland Ltd. All rights reserved.
引用
收藏
页码:465 / 473
页数:9
相关论文
共 45 条
[1]
Bagging schemes on the presence of class noise in classification [J].
Abellan, Joaquin ;
Masegosa, Andres R. .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) :6827-6837
[2]
Shape quantization and recognition with randomized trees [J].
Amit, Y ;
Geman, D .
NEURAL COMPUTATION, 1997, 9 (07) :1545-1588
[3]
A two-stage evolutionary algorithm based on sensitivity and accuracy for multi-class problems [J].
Antonio Gutierrez, Pedro ;
Hervas-Martinez, Cesar ;
Jose Martinez-Estudillo, Francisco ;
Carbonero, Mariano .
INFORMATION SCIENCES, 2012, 197 :20-37
[5]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]
Breiman L., 1994, BAGGING PREDICTORS
[7]
Computer-aided diagnosis system: A Bayesian hybrid classification method [J].
Calle-Alonso, F. ;
Perez, C. J. ;
Arias-Nicolas, J. P. ;
Martin, J. .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2013, 112 (01) :104-134
[8]
Ceusters W., 2000, Medical Data Mining and Knowledge Discovery, P32
[9]
A GAs based approach for mining breast cancer pattern [J].
Chen, TC ;
Hsu, TC .
EXPERT SYSTEMS WITH APPLICATIONS, 2006, 30 (04) :674-681
[10]
Uniqueness of medical data mining [J].
Cios, KJ ;
Moore, GW .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2002, 26 (1-2) :1-24