Performance guarantees for regularized maximum entropy density estimation

被引：78

作者：

Dudík, M

Phillips, SJ

Schapire, RE

机构：

[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA

[2] AT&T Labs Res, Florham Pk, NJ 07932 USA

来源：

LEARNING THEORY, PROCEEDINGS | 2004年 / 3120卷

关键词：

D O I：

10.1007/978-3-540-27819-1_33

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of estimating an unknown probability distribution from samples using the principle of maximum entropy (maxent). To alleviate overfitting with a very large number of features, we propose applying the maxent principle with relaxed constraints on the expectations of the features. By convex duality, this turns out to be equivalent to finding the Gibbs distribution minimizing a regularized version of the empirical log loss. We prove non-asymptotic bounds showing that, with respect to the true underlying distribution, this relaxed version of maxent produces density estimates that are almost as good as the best possible. These bounds are in terms of the deviation of the feature empirical averages relative to their true expectations, a number that can be bounded using standard uniform-convergence techniques. In particular, this leads to bounds that drop quickly with the number of samples, and that depend very moderately on the number or complexity of the features. We also derive and prove convergence for both sequential-update and parallel-update algorithms. Finally, we briefly describe experiments on data relevant to the modeling of species geographical distributions.

引用

页码：472 / 486

页数：15

共 21 条

[1]

[Anonymous], ADV NEURAL INFORM PR

[2]

Berger AL, 1996, COMPUT LINGUIST, V22, P39

[3] A survey of smoothing techniques for ME models [J].

Chen, SF ;

Rosenfeld, R .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01) :37-50

[4] Logistic regression, AdaBoost and Bregman distances [J].

Collins, M ;

Schapire, RE ;

Singer, Y .

MACHINE LEARNING, 2002, 48 (1-3) :253-285

[5] GENERALIZED ITERATIVE SCALING FOR LOG-LINEAR MODELS [J].

DARROCH, JN ;

RATCLIFF, D .

ANNALS OF MATHEMATICAL STATISTICS, 1972, 43 (05) :1470-&

[6]

DEKEL O, 2003, P 16 ANN C COMP LEAR, P433

[7] Inducing features of random fields [J].

DellaPietra, S ;

DellaPietra, V ;

Lafferty, J .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) :380-393

[8] BOUNDS FOR THE UNIFORM DEVIATION OF EMPIRICAL MEASURES [J].

DEVROYE, L .

JOURNAL OF MULTIVARIATE ANALYSIS, 1982, 12 (01) :72-79

[9]

Goodman J, 2003, EXPONENTIAL PRIORS M

[10] INFORMATION THEORY AND STATISTICAL MECHANICS [J].

JAYNES, ET .

PHYSICAL REVIEW, 1957, 106 (04) :620-630

← 1 2 3 →