Subset clustering of binary sequences, with an application to genomic abnormality data

被引:18
作者
Hoff, PD [1 ]
机构
[1] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[2] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[3] Univ Washington, Ctr Stat & Social Sci, Seattle, WA 98195 USA
关键词
genetic pathway; multivariate binary data; nonparametric bayes; unsupervised learning;
D O I
10.1111/j.1541-0420.2005.00381.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
This article develops a model-based approach to clustering multivariate binary data, in which the attributes that distinguish a cluster from the rest of the population may depend on the cluster being considered. The clustering approach is based on a multivariate Dirichlet process mixture model, which allows for the estimation of the number of clusters, the cluster memberships, and the cluster-specific parameters in a unified way. Such a clustering approach has applications in the analysis of genomic abnormality data, in which the development of different types of tumors may depend on the presence of certain abnormalities at subsets of locations along the genome. Additionally, such a mixture model provides a nonparametric estimation scheme for dependent sequences of binary data.
引用
收藏
页码:1027 / 1036
页数:10
相关论文
共 20 条
[1]   MIXTURES OF DIRICHLET PROCESSES WITH APPLICATIONS TO BAYESIAN NONPARAMETRIC PROBLEMS [J].
ANTONIAK, CE .
ANNALS OF STATISTICS, 1974, 2 (06) :1152-1174
[2]   FERGUSON DISTRIBUTIONS VIA POLYA URN SCHEMES [J].
BLACKWELL, D ;
MACQUEEN, JB .
ANNALS OF STATISTICS, 1973, 1 (02) :353-355
[3]   STATISTICAL-ANALYSIS OF CYTOGENETIC ABNORMALITIES IN HUMAN CANCER-CELLS [J].
BRODEUR, GM ;
TSIATIS, AA ;
WILLIAMS, DL ;
LUTHARDT, FW ;
GREEN, AA .
CANCER GENETICS AND CYTOGENETICS, 1982, 7 (02) :137-152
[4]  
Dahl D.B., 2003, IMPROVED MERGE SPLIT
[5]   Inferring tree models for oncogenesis from comparative genome hybridization data [J].
Desper, R ;
Jiang, F ;
Kallioniemi, OP ;
Moch, H ;
Papadimitriou, CH ;
Schäffer, AA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (01) :37-51
[6]  
ESCOBAR MD, 1998, LECT NOTES STAT, V133, P1
[7]   Clustering objects on subsets of attributes [J].
Friedman, JH ;
Meulman, JJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2004, 66 :815-839
[8]  
HOFF PD, 2004, 448 U WASH DEP STAT
[9]   A split-merge Markov chain Monte Carlo procedure for the dirichlet process mixture model [J].
Jain, S ;
Neal, RM .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2004, 13 (01) :158-182
[10]  
Jiang F, 2000, CANCER RES, V60, P6503