GMDH-based feature ranking and selection for improved classification of medical data

被引:56
作者
Abdel-Aal, RE [1 ]
机构
[1] King Fahd Univ Petr & Minerals, Dept Phys, Dhahran 31261, Saudi Arabia
关键词
abductive networks; neural networks; feature ranking; feature selection; dimensionality reduction; classification accuracy; ROC characteristics; medical diagnosis; breast cancers; heart disease;
D O I
10.1016/j.jbi.2005.03.003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Medical applications are often characterized by a large number of disease markers and a relatively small number of data records. We demonstrate that complete feature ranking followed by selection can lead to appreciable reductions in data dimensionality, with significant improvements in the implementation and performance of classifiers for medical diagnosis. We describe a novel approach for ranking all features according to their predictive quality using properties unique to learning algorithms based oil the group method of data handling (GMDH). An abductive network training algorithm is repeatedly used to select groups of optimum predictors from the feature set at gradually increasing levels of model complexity specified by the user. Groups selected earlier are better predictors. The process is then repeated to rank features within individual groups. The resulting full feature ranking can be used to determine the optimum feature subset by starting at the top of the list and progressively including more features until the classification error rate on all out-of-sample evaluation set starts to increase due to overfilling. The approach is demonstrated on two medical diagnosis datasets (breast cancer and heart disease) and comparisons are made with other feature ranking and selection methods. Receiver operating characteristics (ROC) analysis is used to compare classifier performance. At default model complexity, dimensionality reduction of 22 and 54%, Could be achieved for the breast cancer and heart disease data, respectively, leading to improvements in the overall classification performance. For both datasets, considerable dimensionality reduction introduced no significant reduction in the area under the ROC curve. GMDH-based feature selection results have also proved effective with neural network classifiers. (c) 2005 Elsevier Inc. All rights reserved.
引用
收藏
页码:456 / 468
页数:13
相关论文
共 44 条
[21]  
Farlow SJ, 1984, SELF ORG METHODS MOD, P1
[22]   Comparison of linear, nonlinear, and feature selection methods for EEG signal classification [J].
Garrett, D ;
Peterson, DA ;
Anderson, CW ;
Thaut, MH .
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2003, 11 (02) :141-144
[23]   A computer-aided diagnostic system to characterize CT focal liver lesions: Design and optimization of a neural network classifier [J].
Gletsos, M ;
Mougiakakou, SG ;
Matsopoulos, GK ;
Nikita, KS ;
Nikita, AS ;
Kelekis, D .
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2003, 7 (03) :153-162
[24]   A METHOD OF COMPARING THE AREAS UNDER RECEIVER OPERATING CHARACTERISTIC CURVES DERIVED FROM THE SAME CASES [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1983, 148 (03) :839-843
[25]  
Hassanien AE, 2004, INFORMATICA-LITHUAN, V15, P23
[26]  
Hoffman AJ, 1998, IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE 98) - PROCEEDINGS, VOLS 1 AND 2, P663, DOI 10.1109/ISIE.1998.711699
[27]  
Kira K., 1992, P 9 INT WORKSH MACH, P249, DOI DOI 10.1016/B978-1-55860-247-2.50037-1
[28]  
Kittler J., 1986, Handbook of Pattern Recognition and Image Processing, P59, DOI DOI 10.1007/978-1-4684-5188-7_8
[29]   Wrappers for feature subset selection [J].
Kohavi, R ;
John, GH .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :273-324
[30]  
KONDO T, 1999, P 38 SICE ANN C, P1181