Inducing features of random fields

被引:468
作者
DellaPietra, S [1 ]
DellaPietra, V [1 ]
Lafferty, J [1 ]
机构
[1] CARNEGIE MELLON UNIV,SCH COMP SCI,DEPT COMP SCI,PITTSBURGH,PA 15213
基金
美国国家科学基金会;
关键词
random field; Kullback-Leibler divergence; iterative scaling; maximum entropy; EM algorithm; statistical learning; clustering; word morphology; natural language processing;
D O I
10.1109/34.588021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a technique for constructing random fields from a see of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the Kuliback-Leibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are non-Markovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classification in natural language processing.
引用
收藏
页码:380 / 393
页数:14
相关论文
共 22 条
[1]  
Almeida M.P., 1993, ANN APPL PROBAB, V3, P103
[2]   NONCAUSAL GAUSS-MARKOV RANDOM-FIELDS - PARAMETER STRUCTURE AND ESTIMATION [J].
BALRAM, N ;
MOURA, JMF .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (04) :1333-1355
[3]  
Berger AL, 1996, COMPUT LINGUIST, V22, P39
[4]  
BREIAMN L, 1984, CLASSIFICATION REGRE
[5]  
Brown D. T., 1959, INFORM CONTR, V2, P386, DOI DOI 10.1016/S0019-9958(59)80016-4
[6]  
Brown P. F., 1992, Computational Linguistics, V18, P467
[7]  
Brown P. F., 1990, Computational Linguistics, V16, P79
[8]   AN ITERATIVE GIBBSIAN TECHNIQUE FOR RECONSTRUCTION OF M-ARY IMAGES [J].
CHALMOND, B .
PATTERN RECOGNITION, 1989, 22 (06) :747-761
[9]   A GEOMETRIC INTERPRETATION OF DARROCH AND RATCLIFF GENERALIZED ITERATIVE SCALING [J].
CSISZAR, I .
ANNALS OF STATISTICS, 1989, 17 (03) :1409-1413
[10]   I-DIVERGENCE GEOMETRY OF PROBABILITY DISTRIBUTIONS AND MINIMIZATION PROBLEMS [J].
CSISZAR, I .
ANNALS OF PROBABILITY, 1975, 3 (01) :146-158