Probabilistic Classification Vector Machines

被引:98
作者
Chen, Huanhuan [1 ]
Tino, Peter [1 ]
Yao, Xin [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, CERCIA, Birmingham B15 2TT, W Midlands, England
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2009年 / 20卷 / 06期
基金
英国工程与自然科学研究理事会; 英国惠康基金;
关键词
Bayesian classification; machine learning; probabilistic classification model; support vector machine;
D O I
10.1109/TNN.2009.2014161
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a sparse learning algorithm, probabilistic classification vector machines (PCVMs), is proposed. We analyze relevance vector machines (RVMs) for classification problems and observe that adopting the same prior for different classes may lead to unstable solutions. In order to tackle this problem, a signed and truncated Gaussian prior is adopted over every weight in PCVMs, where the sign of prior is determined by the class label, i.e., +1 or -1. The truncated Gaussian prior not only restricts the sign of weights but also leads to a sparse estimation of weight vectors, and thus controls the complexity of the model. In PCVMs, the kernel parameters can be optimized simultaneously within the training algorithm. The performance of PCVMs is extensively evaluated on four synthetic data sets and 13 benchmark data sets using three performance metrics, error rate (ERR), area under the curve of receiver operating characteristic (AUC), and root mean squared error (RMSE). We compare PCVMs with soft-margin support vector machines (SVMSoft), hard-margin support vector machines (SVMHard), SVM with the kernel parameters optimized by PCVMs (SVMPCVM), relevance vector machines (RVMs), and some other baseline classifiers. Through five replications of twofold cross-validation F test, i.e., 5 x 2 cross-validation F test, over single data sets and Friedman test with the corresponding post-hoc test to compare these algorithms over multiple data sets, we notice that PCVMs outperform other algorithms, including SVMSoft, SVMHard, RVM, and SVMPCVM, on most of the data sets under the three metrics, especially under AUC. Our results also reveal that the performance of SVMPCVM is slightly better than SVMSoft, implying that the parameter optimization algorithm in PCVMs is better than cross validation in terms of performance and computational complexity. In this paper, we also discuss the superiority of PCVMs' formulation using maximum a posteriori (MAP) analysis and margin analysis, which explain the empirical success of PCVMs.
引用
收藏
页码:901 / 914
页数:14
相关论文
共 27 条
[1]   Combined 5 x 2 cv F test for comparing supervised classification learning algorithms [J].
Alpaydin, E .
NEURAL COMPUTATION, 1999, 11 (08) :1885-1892
[2]  
[Anonymous], 2004, KDD, DOI DOI 10.1073/pnas.0901650106
[3]  
[Anonymous], 1998, UCI REPOSITORY MACHI
[4]  
[Anonymous], 2007, NUMERICAL RECIPES
[5]  
[Anonymous], 2018, Generalized linear models
[6]   Prediction games and arcing algorithms [J].
Breiman, L .
NEURAL COMPUTATION, 1999, 11 (07) :1493-1517
[7]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[8]  
Burges CJC, 1997, ADV NEUR IN, V9, P375
[9]  
Cawley G.C., 2000, MATLAB SUPPORT VECTO
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38