Analysis of statistical question classification for fact-based questions

被引:76
作者
Metzler, D [1 ]
Croft, WB [1 ]
机构
[1] Univ Massachusetts, Amherst, MA 01003 USA
来源
INFORMATION RETRIEVAL | 2005年 / 8卷 / 03期
基金
美国国家科学基金会;
关键词
question classification; question answering; machine learning; Support Vector Machines; syntactic features; semantic features; WordNet;
D O I
10.1007/s10791-005-6995-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Question classification systems play an important role in question answering systems and can be used in a wide range of other domains. The goal of question classification is to accurately assign labels to questions based on expected answer type. Most approaches in the past have relied on matching questions against hand-crafted rules. However, rules require laborious effort to create and often suffer from being too specific. Statistical question classification methods overcome these issues by employing machine learning techniques. We empirically show that a statistical approach is robust and achieves good performance on three diverse data sets with little or no hand tuning. Furthermore, we examine the role different syntactic and semantic features have on performance. We find that semantic features tend to increase performance more than purely syntactic features. Finally, we analyze common causes of misclassification error and provide insight into ways they may be overcome.
引用
收藏
页码:481 / 504
页数:24
相关论文
共 34 条
[1]  
[Anonymous], 1991, P 29 ANN M ASS COMP, DOI DOI 10.3115/981344.981378
[2]  
[Anonymous], ADV KERNEL METHODS
[3]   An algorithm that learns what's in a name [J].
Bikel, DM ;
Schwartz, R ;
Weischedel, RM .
MACHINE LEARNING, 1999, 34 (1-3) :211-231
[4]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[5]  
Chinchor N., 1998, P MUC 7
[6]  
COLLINS M, 2002, ADV NEURAL INFORMATI, V14
[7]  
Davidov D., 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P250, DOI 10.1145/1008992.1009036
[8]   Inducing features of random fields [J].
DellaPietra, S ;
DellaPietra, V ;
Lafferty, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) :380-393
[9]  
FELLBAUM C, 2000, WORDNET ELECT LEXICA
[10]  
Hovy E., 2001, P 1 INT C HUM LANG T, P1, DOI DOI 10.3115/1072133.1072221