Learning the lie groups of visual invariance

被引：26

作者：

Miao, Xu ^{[1
]}

Rao, Rajesh P. N. ^{[1
]}

机构：

[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA

来源：

NEURAL COMPUTATION | 2007年 / 19卷 / 10期

关键词：

D O I：

10.1162/neco.2007.19.10.2665

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A fundamental problem in biological and machine vision is visual invariance: How are objects perceived to be the same despite transformations such as translations, rotations, and scaling? In this letter, we describe a new, unsupervised approach to learning invariances based on Lie group theory. Unlike traditional approaches that sacrifice information about transformations to achieve invariance, the Lie group approach explicitly models the effects of transformations in images. As a result, estimates of transformations are available for other purposes, such as pose estimation and visuomotor control. Previous approaches based on first-order Taylor series expansions of images can be regarded as special cases of the Lie group approach, which utilizes a matrix-exponential-based generative model of images and can handle arbitrarily large transformations. We present an unsupervised expectation-maximization algorithm for learning Lie transformation operators directly from image data containing examples of transformations. Our experimental results show that the Lie operators learned by the algorithm from an artificial data set containing six types of affine transformations closely match the analytically predicted affine operators. We then demonstrate that the algorithm can also recover novel transformation operators from natural image sequences. We conclude by showing that the learned operators can be used to both generate and estimate transformations in images, thereby providing a basis for achieving visual invariance.

引用

页码：2665 / 2693

页数：29

共 30 条

[1]

[Anonymous], INTRO SHANNON SAMPLI

[2]

[Anonymous], 1996, P EUROPEAN C COMPUTE

[3]

[Anonymous], P EUR C COMP VIS

[4] AN INFORMATION MAXIMIZATION APPROACH TO BLIND SEPARATION AND BLIND DECONVOLUTION [J].

BELL, AJ ;

SEJNOWSKI, TJ .

NEURAL COMPUTATION, 1995, 7 (06) :1129-1159

[5] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[6] THE LIE TRANSFORMATION GROUP MODEL OF VISUAL-PERCEPTION [J].

DODWELL, PC .

PERCEPTION & PSYCHOPHYSICS, 1983, 34 (01) :1-16

[7]

Eves H. W., 1980, Elementary matrix theory

[8] Learning Invariance from Transformation Sequences [J].

Foldiak, Peter .

NEURAL COMPUTATION, 1991, 3 (02) :194-200

[9]

Frey B. J., 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), P416, DOI 10.1109/CVPR.1999.786972

[10] NEOCOGNITRON - A SELF-ORGANIZING NEURAL NETWORK MODEL FOR A MECHANISM OF PATTERN-RECOGNITION UNAFFECTED BY SHIFT IN POSITION [J].

FUKUSHIMA, K .

BIOLOGICAL CYBERNETICS, 1980, 36 (04) :193-202

← 1 2 3 →