Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features

被引:120
作者
Zhou, Qifeng [1 ]
Zhou, Hao [1 ]
Li, Tao [2 ,3 ]
机构
[1] Xiamen Univ, Sch Aerosp Engn, Automat Dept, Xiamen 361005, Peoples R China
[2] Florida Int Univ, Sch Comp & Informat Sci, Miami, FL 33199 USA
[3] Nanjing Univ Posts & Telecommun, Sch Comp Sci & Technol, Nanjing 210046, Jiangsu, Peoples R China
关键词
Cost sensitive; Feature selection; Random forest; MODEL;
D O I
10.1016/j.knosys.2015.11.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection aims to select a small subset of informative features that contain most of the information related to a given task. Existing feature selection methods often assume that all the features have the same cost. However, in many real world applications, different features may have different costs (e.g., different tests a patient might take in medical diagnosis). Ignoring the feature cost may produce good feature subsets in theory but they can not be used in practice. In this paper, we propose a random forest-based feature selection algorithm that incorporates the feature cost into the base decision tree construction process to produce low-cost feature subsets. In particular, when constructing a base tree, a feature is randomly selected with a probability inversely proportional to its associated cost. We evaluate the proposed method on a number of UCI datasets and apply it to a medical diagnosis problem where the real feature costs are estimated by experts. The experimental results demonstrate that our feature-cost-sensitive random forest (FCS-RF) is able to select a low-cost subset of informative features and achieves better performance than other state-of-art feature selection methods in real-world problems. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 32 条
[1]  
[Anonymous], P IJCAI 89
[2]  
[Anonymous], WORKSH COST SENS LEA
[3]  
[Anonymous], ICML WORKSH INF INT
[4]  
[Anonymous], PATT RECOGN LETT
[5]  
[Anonymous], J ARTIF INTELL RES
[6]  
[Anonymous], 2006, Introduction to Data Mining
[7]  
[Anonymous], INT C ART INT STAT
[8]  
[Anonymous], INT JOINT C ART INT
[9]  
[Anonymous], 1994, Proceedings of the AAAI Fall Symposium on Relevance
[10]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140