Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection

被引:48
作者
Graham, MW [1 ]
Miller, DJ [1 ]
机构
[1] Penn State Univ, Dept Elect Engn, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
Bayesian information criterion (BIC); document clustering; EM algorithm; mixture models; model order selection; unsupervised feature selection;
D O I
10.1109/TSP.2006.870586
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Estimating the number of components (the order) in a mixture model is often addressed using criteria such as the Bayesian information criterion (BIC) and minimum message length. However, when the feature space is very large, use of these criteria may grossly underestimate the order. Here, it is suggested that this failure is not mainly attributable to the criterion (e.g., BIC), but rather to the lack of '' structure '' in standard mixtures-these models trade off data fitness and model complexity only by varying the order. The authors of the present paper propose mixtures with a richer set of tradeoffs. The proposed model allows each component its own informative feature subset, with all other features explained by a common model (shared by all components). Parameter sharing greatly reduces complexity at a given order. Since the space of these parsimonious modeling solutions is vast, this space is searched in an efficient manner, integrating the component and feature selection within the generalized expectation-maximization (GEM) learning for the mixture parameters. The quality of the proposed (unsupervised) solutions is evaluated using both classification error and test set data likelihood. text data, the proposed multinomial version-learned without labeled examples, without knowing the '' true '' number of topics, and without feature preprocessing-compares quite favorably with both alternative unsupervised methods and with a supervised naive Bayes classifier. A Gaussian version compares favorably with a recent method introducing '' feature saliency '' in mixtures.
引用
收藏
页码:1289 / 1303
页数:15
相关论文
共 32 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], P AAAI 98 WORKSH LEA
[3]  
[Anonymous], [No title captured], DOI DOI 10.1145/347090.347169
[4]  
[Anonymous], P 25 ANN INT ACM SIG
[5]  
Breese J. S., 1998, UAI, P43, DOI 10.5555/2074094.2074100
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]  
DEVANEY M., 1997, P 14 INT C MACH LEAR, P92
[8]   Concept decompositions for large sparse text data using clustering [J].
Dhillon, IS ;
Modha, DS .
MACHINE LEARNING, 2001, 42 (1-2) :143-175
[9]  
DUDA RO, 1973, PATTERN CLASSIFICAIT
[10]  
Dy J. G., 2000, ICML '00, P247, DOI DOI 10.5555/645529.657797