Learning to classify short text from scientific documents using topic models with various types of knowledge

被引:67
作者
Vo, Duc-Thuan [1 ]
Ock, Cheol-Young [1 ]
机构
[1] Univ Ulsan, Sch Elect Engn, Ulsan 680749, South Korea
基金
新加坡国家研究基金会;
关键词
Data sparseness; Information retrieval; Latent Dirichlet Allocation; Short text classification; Topic model; CATEGORIZATION; WORD;
D O I
10.1016/j.eswa.2014.09.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of short text is challenging due to data sparseness, which is a typical characteristic of short text. In this paper, we propose methods for enhancing features using topic models, which make short text seem less sparse and more topic-oriented for classification. We exploited topic model analysis based on Latent Dirichlet Allocation for enriched datasets, and then we presented new methods for enhancing features by combining external texts from topic models that make documents more effective for classification. In experiments, we utilized the title contents of scientific articles as short text documents, and then enriched these documents using topic models from various types of universal datasets for classification in order to show that our approach performs efficiently. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1684 / 1698
页数:15
相关论文
共 39 条
[1]  
[Anonymous], 2008, PARAMETER ESTIMATION
[2]  
[Anonymous], THESIS KLUWER
[3]  
[Anonymous], P ACM SIGIR 2003
[4]  
[Anonymous], P ACM CIKM 12
[5]  
[Anonymous], P AAAI 98 WORKSH LEA
[6]  
[Anonymous], P 16 INT C WORLD WID
[7]  
[Anonymous], P REC ADV NAT LANG P
[8]  
[Anonymous], P ACM SIGIR 12
[9]  
[Anonymous], 2011, P 22 INT JOINT C ART, DOI DOI 10.5591/978-1-57735-516-8/IJCAI11-298
[10]  
[Anonymous], P SIGIR 12 2012