Termite: Visualization Techniques for Assessing Textual Topic Models

被引:274
作者
Chuang, Jason [1 ]
Manning, Christopher D. [1 ]
Heer, Jeffrey [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
PROCEEDINGS OF THE INTERNATIONAL WORKING CONFERENCE ON ADVANCED VISUAL INTERFACES | 2012年
关键词
Topic Models; Text Visualization; Seriation;
D O I
10.1145/2254556.2254572
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Topic models aid analysis of text corpora by identifying latent topics based on co-occurring words. Real-world deployments of topic models, however, often require intensive expert verification and model refinement. In this paper we present Termite, a visual analysis tool for assessing topic model quality. Termite uses a tabular layout to promote comparison of terms both within and across latent topics. We contribute a novel saliency measure for selecting relevant terms and a seriation algorithm that both reveals clustering structure and promotes the legibility of related terms. In a series of examples, we demonstrate how Termite allows analysts to identify coherent and significant themes.
引用
收藏
页码:74 / 77
页数:4
相关论文
共 23 条
[1]  
AlSumait L., 2009, ECML
[2]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[3]  
Chang J., 2009, Adv. Neural Inf. Process. Syst., V22, DOI DOI 10.5555/2984093.2984126
[4]   DocuBurst: Visualizing Document Content using Language Structure [J].
Collins, Christopher ;
Carpendale, Sheelagh ;
Penn, Gerald .
COMPUTER GRAPHICS FORUM, 2009, 28 (03) :1039-1046
[5]  
Dunning T., 1993, Computational Linguistics, V19, P61
[6]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[7]  
Friendly M., 2009, AM STAT
[8]  
Gardner M. J., 2010, NIPS
[9]  
Hall D., 2008, P 2008 C EMP METH NA, P363, DOI DOI 10.3115/1613715.1613763
[10]  
Henry N., 2007, INTERACT