基于半监督学习的短文本分类方法

被引：4

作者：

孙学琛

高志强

全志斌

施嘉鸿

机构：

[1] 东南大学计算机科学与工程学院

来源：

山东理工大学学报(自然科学版) | 2012年 / 26卷 / 01期

关键词：

半监督学习; 协作分类; 短文本分类; 数据挖掘;

D O I：

10.13367/j.cnki.sdgc.2012.01.010

中图分类号：

TP391.1 [文字信息处理];

学科分类号：

081203 ; 0835 ;

摘要：

随着万维网的快速普及和发展,Web上出现了大量短文本,如科技文献摘要、微博和电子邮件等.短文本内容短小,相互联系,已标注数据获得困难,导致传统分类方法很难取得较高的分类精度.为了解决短文本分类问题,提出了一种基于半监督学习的迭代分类算法(SS-ICA).它使用较少的已标记数据,利用短文本间的关系进行迭代分类.通过与常用分类方法进行对比表明,在标注数据较少的情况下SS-ICA比其他分类器有更高的分类精度.

引用

页码：1 / 4

页数：4

共 7 条

[1]

Collective classification innetwork data. Sen P,Namata G,Bilgic M,et al. The AI Magazine .

[2]

Combining labeled and unlabeled data with co-training. Blum A,Mitchell T. Proceedings of the 11th Annual Conference on Computational Learning Theory(COLT’98) . 1998

[3]

Iterative Classification in RelationalData. Neville J,Jensen D. Proceedings of AAAI-2000 Workshop onLearning Statistical Models from Relational Data . 2000

[4]

Learning to classify short and sparse text&web with hidden topics from large-scale data collections. Phan Xuan-hieu,Nguyen Le-minh,Horiguchi Susumu. Proceeding of the17th International Conference on World Wide Web . April21-252008

[5]

＂CiteSeer: An Automatic Citation Indexing System＂. C. L. Giles,K. Bollacker,and S. Lawrence. Digital Libraries 98: Third ACM Conf. on Digital Libraries . 1998

[6] Automating the construction of internet portals with machine learning [J].

McCallum, AK ;

Nigam, K ;

Rennie, J ;

Seymore, K .

INFORMATION RETRIEVAL, 2000, 3 (02) :127-163

[7]

The WEKA data miningsoftware:an update. Hall M,Frank E,Holmes G,et al. SIGKDD Explorations (SIGKDD) . 2009

← 1 →