Identification of interaction patterns and classification with applications to microarray data

被引:8
作者
Boulesteix, AL [1 ]
Tutz, G [1 ]
机构
[1] Univ Munich, Dept Stat, Seminar Appl Stochast, D-80799 Munich, Germany
关键词
classification trees; discrimination; gene expression; emerging patterns;
D O I
10.1016/j.csda.2004.10.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Emerging patterns represent a class of interaction structures which has been recently proposed as a too] in data mining. A new and more general definition referring to underlying probabilities is proposed. The defined interaction patterns (IP) carry information about the relevance of combinations of variables for distinguishing between classes. Since they are formally quite similar to the leaves of a classification tree, a fast and simple method which is based on the CART algorithm is proposed to find the corresponding empirical patterns in data sets. In simulations, it can be shown that the method is quite effective in identifying patterns. In addition, the detected patterns can be used to define new variables for classification. Thus, a simple scheme to use the patterns to improve the performance of classification procedures is proposed. The method may also be seen as a scheme to improve the performance of CARTs concerning the identification of IP as well as the accuracy of prediction. (c) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:783 / 802
页数:20
相关论文
共 21 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], 2011, Categorical data analysis
[3]  
[Anonymous], 1999, Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining p, DOI [10.1145/312129., DOI 10.1145/312129, 10.1145/312129, 10.1145/312129.312191]
[4]   A CART-based approach to discover emerging patterns in microarray data [J].
Boulesteix, AL ;
Tutz, G ;
Strimmer, K .
BIOINFORMATICS, 2003, 19 (18) :2465-2472
[5]  
Breiman L., 1998, CLASSIFICATION REGRE
[6]   Boosting for tumor classification with gene expression data [J].
Dettling, M ;
Bühlmann, P .
BIOINFORMATICS, 2003, 19 (09) :1061-1069
[7]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[8]  
DUDOIT S, 2000, COMP DISCR METH CLAS
[9]  
Friedman J., 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5
[10]   Bump hunting in high-dimensional data [J].
Friedman J.H. ;
Fisher N.I. .
Statistics and Computing, 1999, 9 (2) :123-143