Classification by ensembles from random partitions of high-dimensional data

被引:73
作者
Ahn, Hongshik [1 ]
Moon, Hojin
Fazzari, Melissa J.
Lim, Noha
Chen, James J.
Kodell, Ralph L.
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Natl Ctr Toxicol Res, Div Biometry & Risk Assessment, Jefferson, AR 72079 USA
[3] Univ Arkansas Med Sci, Dept Biostat, Little Rock, AR 72205 USA
关键词
class prediction; classification tree; cross validation; logistic regression; majority voting; risk profiling;
D O I
10.1016/j.csda.2006.12.043
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A robust classification procedure is developed based on ensembles of classifiers, with each classifier constructed from a different set of predictors determined by a random partition of the entire set of predictors. The proposed methods combine the results of multiple classifiers to achieve a substantially improved prediction compared to the optimal single classifier. This approach is designed specifically for high-dimensional data sets for which a classifier is sought. By combining classifiers built from each subspace of the predictors, the proposed methods achieve a computational advantage in tackling the growing problem of dimensionality. For each subspace of the predictors, we build a classification tree or logistic regression tree. Our study shows, using four real data sets from different areas, that our methods perform consistently well compared to widely used classification methods. For unbalanced data, our approach maintains the balance between sensitivity and specificity more adequately than many other classification methods considered in this study. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:6166 / 6179
页数:14
相关论文
共 43 条
[1]   Tree-structured logistic model for over-dispersed binomial data with application to modeling developmental effects [J].
Ahn, H ;
Chen, JJ .
BIOMETRICS, 1997, 53 (02) :435-455
[2]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[3]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[4]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[5]  
[Anonymous], 2004, 666 U CAL DEP STAT
[6]  
[Anonymous], 2004, MACHINE LEARNING BEN
[7]   The estrogen receptor relative binding affinities of 188 natural and xenochemicals: Structural diversity of ligands [J].
Blair, RM ;
Fang, H ;
Branham, WS ;
Hass, BS ;
Dial, SL ;
Moland, CL ;
Tong, WD ;
Shi, LM ;
Perkins, R ;
Sheehan, DM .
TOXICOLOGICAL SCIENCES, 2000, 54 (01) :138-153
[8]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[9]   Phytoestrogens and mycoestrogens bind to the rat uterine estrogen receptor [J].
Branham, WS ;
Dial, SL ;
Moland, CL ;
Hass, BS ;
Blair, RM ;
Fang, H ;
Shi, LM ;
Tong, WD ;
Perkins, RG ;
Sheehan, DM .
JOURNAL OF NUTRITION, 2002, 132 (04) :658-664
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32