Agreement, the F-measure, and reliability in information retrieval

被引:631
作者
Hripcsak, G [1 ]
Rothschild, AS [1 ]
机构
[1] Columbia Univ, Dept Med Informat, Dept Biomed Informat, New York, NY 10032 USA
关键词
D O I
10.1197/jamia.M1733
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the K statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can be shown that the average F-measure among pairs of experts is numerically identical to the average positive specific agreement among experts and that K approaches these measures as the number of negative cases grows large. Positive specific agreement-or the equivalent F-measure may be an appropriate way to quantify interrater reliability and therefore to assess the reliability of a gold standard in these studies.
引用
收藏
页码:296 / 298
页数:3
相关论文
共 9 条
[1]  
[Anonymous], 1997, EVALUATION METHODS M
[2]  
Brants Thorsten, 2000, 2 INT C LANG RES EV
[3]  
Fleiss J. L, 1981, STAT METHODS RATES P, P212
[4]   MEASURING AGREEMENT BETWEEN 2 JUDGES ON PRESENCE OR ABSENCE OF A TRAIT [J].
FLEISS, JL .
BIOMETRICS, 1975, 31 (03) :651-659
[5]   Approximate standard errors and confidence intervals for indices of positive and negative agreement [J].
Graham, P ;
Bull, B .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 1998, 51 (09) :763-771
[6]  
HERSH WR, 1995, INFORMATION RETRIEVA, P45
[7]  
Hripcsak G, 2002, J AM MED INFORM ASSN, V9, P1
[8]   Measuring agreement in medical informatics reliability studies [J].
Hripcsak, G ;
Heitjan, DF .
JOURNAL OF BIOMEDICAL INFORMATICS, 2002, 35 (02) :99-110
[9]  
VANRIJSBERGEN CJ, 1979, INFORMATION RETRIEVA