PRIM versus CART in subgroup discovery: When patience is harmful

被引:16
作者
Abu-Hanna, Ameen [1 ]
Nannings, Barry [1 ]
Dongelmans, Dave [2 ]
Hasman, Arie [1 ]
机构
[1] Univ Amsterdam, Acad Med Ctr, Dept Med Informat, NL-1105 AZ Amsterdam, Netherlands
[2] Univ Amsterdam, Acad Med Ctr, Dept Intens Care, NL-1105 AZ Amsterdam, Netherlands
关键词
CART (Classification and Regression Trees); PRIM (Patient Rule Induction Method); Subgroup discovery; Coverage; Patience; High-dimensionality; Clinical databases; Ordinal scores; Bootstrap; RULE-INDUCTION METHOD; ACUTE PHYSIOLOGY; RISK;
D O I
10.1016/j.jbi.2010.05.009
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We systematically compare the established algorithms CART (Classification and Regression Trees) and PRIM (Patient Rule Induction Method) in a subgroup discovery task on a large real-world high-dimensional clinical database. Contrary to current conjectures, PRIM's performance was generally inferior to CART's. PRIM often considered "peeling of" a large chunk of data at a value of a relevant discrete ordinal variable unattractive, ultimately missing an important subgroup. This finding has considerable significance in clinical medicine where ordinal scores are ubiquitous. PRIM's utility in clinical databases would increase when global information about (ordinal) variables is better put to use and when the search algorithm keeps track of alternative solutions. (c) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:701 / 708
页数:8
相关论文
共 26 条
[1]  
[Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946
[2]   Bump hunting for risk: a new data mining tool and its applications [J].
Becker, U ;
Fahrmeir, L .
COMPUTATIONAL STATISTICS, 2001, 16 (03) :373-386
[3]   Thinking inside the box: A participatory, computer-assisted approach to scenario discovery [J].
Bryant, Benjamin P. ;
Lempert, Robert J. .
TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2010, 77 (01) :34-49
[4]   Flexible patient rule induction method for optimizing process variables in discrete type [J].
Chong, Il-Gyo ;
Jun, Chi-Hyuck .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (04) :3014-3020
[5]   A data mining approach to process optimization without an explicit quality function [J].
Chong, Il-Gyo ;
Albin, Susan L. ;
Jun, Chi-Hyuck .
IIE TRANSACTIONS, 2007, 39 (08) :795-804
[6]   Controlling false-negative errors in microarray differential expression analysis: a PRIM approach [J].
Cole, SW ;
Galic, Z ;
Zack, JA .
BIOINFORMATICS, 2003, 19 (14) :1808-1816
[7]  
Duan LD, 2009, INT S HIGH PERF COMP, P129, DOI 10.1109/HPCA.2009.4798244
[8]   An application of the patient rule-induction method for evaluating the contribution of the Apolipoprotein E and Lipoprotein Lipase genes to predicting ischemic heart disease [J].
Dyson, Greg ;
Frikke-Schmidt, Ruth ;
Nordestgaard, Borge G. ;
Tybjaerg-Hansen, Anne ;
Sing, Charles E. .
GENETIC EPIDEMIOLOGY, 2007, 31 (06) :515-527
[9]   Modifications to the Patient Rule-Induction Method That Utilize Non-Additive Combinations of Genetic and Environmental Effects to Define Partitions That Predict Ischemic Heart Disease [J].
Dyson, Greg ;
Frikke-Schmidt, Ruth ;
Nordestgaard, Borge G. ;
Tybjaerg-Hansen, Anne ;
Sing, Charles F. .
GENETIC EPIDEMIOLOGY, 2009, 33 (04) :317-324
[10]   Discussion on the paper by Friedman and Fisher [J].
A. J. Feelders .
Statistics and Computing, 1999, 9 (2) :147-148