Uncovering Bivariate Interactions in High Dimensional Data Using Random Forests with Data Augmentation

被引:1
作者
Arevalillo, Jorge M. [1 ]
Navarro, Hilario [1 ]
机构
[1] UNED Univ, Dept Stat & Operat Res, Madrid 28040, Spain
关键词
Bivariate interactions; random forests; high dimensional data; DIFFERENTIALLY EXPRESSED GENES; STATISTICAL-METHODS; MICROARRAY DATA; SELECTION; CLASSIFICATION; NOISE;
D O I
10.3233/FI-2011-602
中图分类号
TP31 [计算机软件];
学科分类号
081205 [计算机软件];
摘要
Random Forests (RF) is an ensemble technology for classification and regression which has become widely accepted in the bioinformatics community in the last few years. Its predictive strength, along with some of the utilities, rich in information, provided by the output, has made RF an efficient data mining tool for discovering patterns in high dimensional data. In this paper we propose a search strategy that explores a subset of the input space in an exhaustive way using RF as the search engine. Our procedure begins by taking the variables previously rejected by a sequential search procedure and uses the out of bag error rate of the ensemble, obtained when trained over an augmented data set, as criterion to capture difficult to uncover bivariate patterns associated with an outcome variable. We will show the performance of the procedure in some synthetic scenarios and will give an application to a real microarray experiment in order to illustrate how it works for gene expression data.
引用
收藏
页码:97 / 115
页数:19
相关论文
共 35 条
[1]
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]
Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[3]
[Anonymous], METRIKA
[4]
Tissue classification with gene expression profiles [J].
Ben-Dor, A ;
Bruhn, L ;
Friedman, N ;
Nachman, I ;
Schummer, M ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :559-583
[5]
Berrar D, 2006, INT FED INFO PROC, V217, P159
[6]
TRAINING WITH NOISE IS EQUIVALENT TO TIKHONOV REGULARIZATION [J].
BISHOP, CM .
NEURAL COMPUTATION, 1995, 7 (01) :108-116
[7]
Gene selection for cancer classification using wrapper approaches [J].
Blanco, R ;
Larrañaga, P ;
Inza, I ;
Sierra, B .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2004, 18 (08) :1373-1390
[8]
Bo TH, 2002, GENOME BIOL, V3
[9]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]
Breiman L., 2008, BREIMAN CUTLERS RAND