Area under Precision-Recall Curves for Weighted and Unweighted Data

被引:114
作者
Keilwagen, Jens [1 ]
Grosse, Ivo [2 ,3 ]
Grau, Jan [2 ]
机构
[1] Julius Kuhn Inst, Fed Res Ctr Cultivated Plants, Inst Biosafety Plant Biotechnol, Quedlinburg, Germany
[2] Univ Halle Wittenberg, Inst Comp Sci, D-06108 Halle, Saale, Germany
[3] German Ctr Integrat Biodivers Res iDiv, Leipzig, Germany
来源
PLOS ONE | 2014年 / 9卷 / 03期
关键词
CLASSIFICATION; DISCOVERY;
D O I
10.1371/journal.pone.0092209
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Precision-recall curves are highly informative about the performance of binary classifiers, and the area under these curves is a popular scalar performance measure for comparing different classifiers. However, for many applications class labels are not provided with absolute certainty, but with some degree of confidence, often reflected by weights or soft labels assigned to data points. Computing the area under the precision-recall curve requires interpolating between adjacent supporting points, but previous interpolation schemes are not directly applicable to weighted data. Hence, even in cases where weights were available, they had to be neglected for assessing classifiers using precision-recall curves. Here, we propose an interpolation for precision-recall curves that can also be used for weighted data, and we derive conditions for classification scores yielding the maximum and minimum area under the precision-recall curve. We investigate accordances and differences of the proposed interpolation and previous ones, and we demonstrate that taking into account existing weights of test data is important for the comparison of classifiers.
引用
收藏
页数:13
相关论文
共 28 条
[1]   Lost in translation: an assessment and perspective for computational microRNA target identification [J].
Alexiou, Panagiotis ;
Maragkakis, Manolis ;
Papadopoulos, Giorgos L. ;
Reczko, Martin ;
Hatzigeorgiou, Artemis G. .
BIOINFORMATICS, 2009, 25 (23) :3049-3055
[2]  
[Anonymous], 2008, Introduction to information retrieval
[3]  
[Anonymous], 2006, 23 INT C MACH LEARN, DOI [DOI 10.1145/1143844.1143874, 10.1145/1143844.1143874]
[4]  
Aslam J. A., 2005, SIGIR 2005. Proceedings of the Twenty-Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P573, DOI 10.1145/1076034.1076134
[5]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[6]   Supervised reconstruction of biological networks with local models [J].
Bleakley, Kevin ;
Biau, Gerard ;
Vert, Jean-Philippe .
BIOINFORMATICS, 2007, 23 (13) :I57-I65
[7]  
Boyd Kendrick, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8190, P451, DOI 10.1007/978-3-642-40994-3_29
[8]  
Boyd K., 2012, ICML
[9]  
Brodersen K. H., 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P4263, DOI 10.1109/ICPR.2010.1036
[10]  
Fawcett T., 2004, TECHNICAL REPORT