Large-scale Bayesian logistic regression for text categorization

被引:444
作者
Genkin, Alexander [1 ]
Lewis, David D.
Madigan, David
机构
[1] Rutgers State Univ, DIMACS, Piscataway, NJ 08854 USA
[2] David D Lewis Consulting, Chicago, IL 60614 USA
基金
美国国家科学基金会;
关键词
information retrieval; lasso; penalization; ridge regression; support vector classifier; variable selection;
D O I
10.1198/004017007000000245
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.
引用
收藏
页码:291 / 304
页数:14
相关论文
共 42 条
[1]  
Berry MW., 2004, Survey of Text Mining. Clustering, Classification
[2]  
DENNIS JE, 1989, OPTIMIZATION, P1
[3]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[4]   Adaptive sparseness for supervised learning [J].
Figueiredo, MAT .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (09) :1150-1159
[5]  
Figueiredo MAT, 2001, PROC CVPR IEEE, P35
[6]  
Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670
[7]  
Friedman J, 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5
[8]  
Genkin, 2006, P 29 ANN INT ACM SIG, P493
[9]   An equivalence between sparse approximation and support vector machines [J].
Girosi, F .
NEURAL COMPUTATION, 1998, 10 (06) :1455-1480
[10]  
Greenland S, 2000, AM J EPIDEMIOL, V151, P531