Neighbor-weighted K-nearest neighbor for unbalanced text corpus

被引：258

作者：

Tan, SB

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Software Dept, Beijing 100080, Peoples R China

[2] Chinese Acad Sci, Grad Sch, Beijing, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2005年 / 28卷 / 04期

关键词：

text classification; K-nearest neighbor (KNN); information retrieval; data mining;

D O I：

10.1016/j.eswa.2004.12.023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text categorization or classification is the automated assigning of text documents to pre-defined classes based on their contents. Many of classification algorithms usually assume that the training examples are evenly distributed among different classes. However, unbalanced data sets often appear in many practical applications. In order to deal with uneven text sets, we propose the neighbor-weighted K-nearest neighbor algorithm, i.e. NWKNN. The experimental results indicate that our algorithm NWKNN achieves significant classification performance improvement on imbalanced corpora. (c) 2005 Elsevier Ltd. All rights reserved.

引用

页码：667 / 671

页数：5

共 14 条

[1] [Anonymous], CENTROID BASED DOCUM
[2] CHAI KMA, BAYESIAN ONLINE CLAS
[3] JAPKOWICZ N, 2000, P LEARN IMB DAT SETS
[4] Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683
[5] LEWIS DD, 1996, P 19 ANN INT ACM SIG, P298
[6] LEWIS DD, 1994, P 3 ANN S DOC AN INF
[7] LEWIS DD, 1998, 10 EUR C MACH LEARN, P4
[8] TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL
SALTON, G
BUCKLEY, C
[J]. INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (05) : 513 - 523
[9] Singhal A, 1997, PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P25, DOI 10.1145/278459.258530
[10] *TDT2, 1998, NIST TOP DET TRACK C

← 1 2 →