A survey of named entity recognition and classification

被引:87
作者
Nadeau, David [1 ]
Sekine, Satoshi [2 ]
机构
[1] Natl Res Council Canada, 101 St Jean Bosco St, Gatineau, PQ K1A 0R6, Canada
[2] NYU, New York, NY 10003 USA
来源
LINGUISTICAE INVESTIGATIONES | 2007年 / 30卷 / 01期
关键词
named identity; survey; learning method; feature space; evaluation;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This survey covers fifteen years of research in the Named Entity Recognition and Classification (NERC) field, from 1991 to 2006. We report observations about languages, named entity types, domains and textual genres studied in the literature. From the start, NERC systems have been developed using hand-made rules, but now machine learning techniques are widely used. These techniques are surveyed along with other critical aspects of NERC such as features and evaluation methods. Features are word-level, dictionary-level and corpus-level representations of words in a document. Evaluation techniques, ranging from intuitive exact match to very complex matching techniques with adjustable cost of errors, are an indisputable key to progress.
引用
收藏
页码:3 / 26
页数:24
相关论文
共 83 条
[71]  
Settles Burr, 2004, P C COMP LING JOINT
[72]  
Shinyama Yusuke, 2004, P INT C COMP LING
[73]  
Thielen Christine, 1995, P C EUR CHAPT ASS CO
[74]   Various criteria in the evaluation of biomedical named entity recognition [J].
Tsai, RTH ;
Wu, SH ;
Chou, WC ;
Lin, YC ;
He, D ;
Hsiang, J ;
Sung, TY ;
Hsu, WL .
BMC BIOINFORMATICS, 2006, 7 (1) :1-8
[75]  
Tsuruoka Yoshimasa, 2003, P C ASS COMP LING NA
[76]  
Turney Peter, 2001, P EUR C MACH LEARN
[77]  
Wang Liang-Jyh, 1992, P INT C COMP LING
[78]  
Whitelaw Casey, 2003, P AUSTR C ART INT
[79]  
Witten Ian. H., 1999, P INT C MACH LEARN M
[80]  
Wolinski Francis, 1995, P C EUR CHAPT ASS CO