Learning Bayesian classifiers from positive and unlabeled examples

被引:51
作者
Calvo, Boria [1 ]
Larranaga, Pedro [1 ]
Lozano, Jose A. [1 ]
机构
[1] Univ Basque Country, Intelligent Syst Grp, Dept Comp Sci & Artificial Intelligence, Donostia San Sebastian 20018, Spain
关键词
positive unlabeled learning; Bayesian classifiers; naive Bayes; tree augmented naive Bayes; Bayesian approach;
D O I
10.1016/j.patrec.2007.08.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The positive unlabeled learning term refers to the binary classification problem in the absence of negative examples. When only positive and unlabeled instances are available, semi-supervised classification algorithms cannot be directly applied, and thus new algorithms are required. One of these positive unlabeled learning algorithms is the positive naive Bayes (PNB), which is an adaptation of the naive Bayes induction algorithm that does not require negative instances. In this work we propose two ways of enhancing this algorithm. On one hand, we have taken the concept behind PNB one step further, proposing a procedure to build more complex Bayesian classifiers in the absence of negative instances. We present a new algorithm (named positive tree augmented naive Bayes, PTAN) to obtain tree augmented naive Bayes models in the positive unlabeled domain. On the other hand, we propose a new Bayesian approach to deal with the a priori probability of the positive class that models the uncertainty over this parameter by means of a Beta distribution. This approach is applied to both PNB and PTAN, resulting in two new algorithms. The four algorithms are empirically compared in positive unlabeled learning problems based on real and synthetic databases. The results obtained in these comparisons suggest that, when the predicting variables are not conditionally independent given the class, the extension of PNB to more complex networks increases the classification performance. They also show that our Bayesian approach to the a priori probability of the positive class can improve the results obtained by PNB and PTAN. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:2375 / 2384
页数:10
相关论文
共 20 条
[1]  
[Anonymous], P 12 INT C INF KNOWL
[2]  
Bernardo J. M., 1994, BAYESIAN THEORY
[3]   Building text classifiers using positive and unlabeled examples [J].
Bing, L ;
Yang, D ;
Li, XL ;
Lee, WS ;
Yu, PS .
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :179-186
[4]  
Bishop C. M., 2006, Pattern Recognition and Machine Learning, P179
[5]  
Blake C.L., 1998, UCI repository of machine learning databases
[6]   A partially supervised classification approach to dominant and recessive human disease gene prediction [J].
Calvo, Borja ;
Lopez-Bigas, Nuria ;
Furney, Simon J. ;
Larranaga, Pedro ;
Lozano, Jose A. .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2007, 85 (03) :229-237
[7]  
CASTELO R, 2004, BIOINFORMATICS, V4, P169
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]  
Denis F., 2002, P 9 INT C INF PROC M, P1927
[10]  
Denis F., 2003, Proceedings of the ICML 2003 workshop: the continuum from labeled to unlabeled data, P80