Applying MDL to learn best model granularity

被引:18
作者
Gao, Q
Li, M
Vitányi, P
机构
[1] Ctr Wiskunde & Informat, NL-1098 SJ Amsterdam, Netherlands
[2] Chinese Acad Sci, Inst Bot, Beijing, Peoples R China
[3] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Minimum Description Length principle (MDL); Kolmogorov complexity; universal prior; Bayes' rule; Occam's razor; learning best model granularity; on-line handwritten character recognition; improved elastic matching; learning optimal feature extraction interval; modeling robot arm; feedforward neural network; learning optimal hidden layer;
D O I
10.1016/S0004-3702(00)00034-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends critically on the granularity, for example the choice of precision of the parameters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distinguished. This precision is often determined ad hoc. In MDL the best model is the one that most compresses a two-part code of the data set: this embodies "Occam's Razor". In two quite different experimental settings the theoretical value determined using MDL coincides with the best value found experimentally. In the first experiment the task is to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Based on a new modification of elastic matching, using multiple prototypes per character, the optimal prediction rate is predicted for the learned parameter (length of sampling interval) considered most likely by MDL, which is shown to coincide with the best value found experimentally. In the second experiment the task is to model a robot arm with two degrees of freedom using a three layer feed-forward neural network where we need to determine the number of nodes in the hidden layer giving best modeling performance. The optimal model (the one that extrapolizes best on unseen examples) is predicted for the number of nodes in the hidden layer considered most likely by MDL, which again is found to coincide with the best value found experimentally. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:1 / 29
页数:29
相关论文
共 29 条
[1]  
[Anonymous], INTRO COMPUTATIONAL
[2]   The minimum description length principle in coding and modeling [J].
Barron, A ;
Rissanen, J ;
Yu, B .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (06) :2743-2760
[3]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[4]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[5]  
GAO Q, 1989, P 11 IEEE INT JOINT, P843
[6]  
Jeffreys H., 1961, The Theory of Probability, V3rd
[7]   The miraculous universal distribution [J].
Kirchherr, W ;
Li, M ;
Vitanyi, P .
MATHEMATICAL INTELLIGENCER, 1997, 19 (04) :7-15
[8]   3 APPROACHES TO QUANTITATIVE DEFINITION OF INFORMATION [J].
KOLMOGOROV, AN .
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1968, 2 (02) :157-+
[9]   FEATURE ANALYSIS FOR SYMBOL RECOGNITION BY ELASTIC MATCHING [J].
KURTZBERG, JM .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1987, 31 (01) :91-95
[10]   INDUCTIVE REASONING AND KOLMOGOROV COMPLEXITY [J].
LI, M ;
VITANYI, PMB .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1992, 44 (02) :343-384