Fast-learning VIEWNET architectures for recognizing three-dimensional objects from multiple two-dimensional views

被引:51
作者
Bradski, G [1 ]
Grossberg, S [1 ]
机构
[1] BOSTON UNIV,CTR ADAPT SYST,BOSTON,MA 02215
基金
美国国家科学基金会;
关键词
pattern recognition; learning; neural network; ARTMAP; inferotemporal cortex;
D O I
10.1016/0893-6080(95)00053-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recognition of three-dimensional (3-D) objects from sequences of their two-dimensional (2-D) views is modeled by a family of self-organizing neural architectures, called VIEWNET, that use View Information Encoded With NETworks. VIEWNET incorporates a preprocessor that generates a compressed but 2-D invariant representation of an image, a supervised incremental learning system that classifies the preprocessed representations into 2-D view categories whose outputs are combined into 3-D invariant object categories, and a working memory that makes a 3-D object prediction by accumulating evidence from 3-D object category nodes as multiple 2-D views are experienced. The simplest VIEWNET achieves high recognition scores without the need to explicitly code the temporal order of 2-D views in working memory. Working memories are also discussed that save memory resources by implicitly coding temporal order in terms of the relative activity of 2-D view category nodes, rather than as explicit 2-D view transitions. Variants of the VIEWNET architecture may be used for scene understanding by using a preprocessor and classifier that can determine both what objects are in a scene and where they are located. The present VIEWNET preprocessor includes the CORT-X 2 filter, which discounts the illuminant, regularizes and completes figural boundaries, and suppresses image noise. This boundary segmentation is rendered invariant under 2-D translation, rotation, and dilation by use of a log-polar transform. The invariant spectra undergo Gaussian coarse coding to further reduce noise and 3-D foreshortening effects, and to increase generalization. These compressed codes are input into the classifier, a supervised learning system based on the fuzzy ARTMAP algorithm. Fuzzy ARTMAP learns 2-D view categories that are invariant under 2-D image translation, rotation, and dilation as well as 3-D image transformations that do not cause a predictive error. Evidence from sequences of 2-D view categories converges at 3-D object nodes that generate a response invariant under changes of 2-D view. These 3-D object nodes input to a working memory that accumulates evidence over time to improve object recognition. In the simplest working memory, each occurrence (nonoccurrence) of a 2-D view category increases (decreases) the corresponding node's activity in working memory. The maximally active node is used to predict the 3-D object. Recognition is studied with noisy and clean images using slow and fast learning. Slow learning at the fuzzy ARTMAP map field is adapted to learn the conditional probability of the 3-D object given the selected 2-D view category. VIEWNET is demonstrated on an MIT Lincoln Laboratory database of 128x128 2-D views of aircraft with and without additive noire. A recognition rate of up to 90% is achieved with one 2-D view and of up to 98.5% correct with three 2-D views. The properties of 2-D view and 3-D object category nodes are compared with those of cells in monkey inferotemporal cortex.
引用
收藏
页码:1053 / 1080
页数:28
相关论文
共 55 条
[11]   INVARIANT RECOGNITION OF CLUTTERED SCENES BY A SELF-ORGANIZING ART ARCHITECTURE - CORT-X BOUNDARY SEGMENTATION [J].
CARPENTER, GA ;
GROSSBERG, S ;
MEHANIAN, C .
NEURAL NETWORKS, 1989, 2 (03) :169-181
[12]   ART 2-A - AN ADAPTIVE RESONANCE ALGORITHM FOR RAPID CATEGORY LEARNING AND RECOGNITION [J].
CARPENTER, GA ;
GROSSBERG, S ;
ROSEN, DB .
NEURAL NETWORKS, 1991, 4 (04) :493-504
[13]   ART-2 - SELF-ORGANIZATION OF STABLE CATEGORY RECOGNITION CODES FOR ANALOG INPUT PATTERNS [J].
CARPENTER, GA ;
GROSSBERG, S .
APPLIED OPTICS, 1987, 26 (23) :4919-4930
[14]  
CARPENTER GA, 1992, P INT JOINT C NEURAL, V3, P303
[15]   ASPECT GRAPH GENERATION FOR NONCONVEX POLYHEDRA FROM PERSPECTIVE PROJECTION VIEW [J].
CHANG, IC ;
HUANG, CL .
PATTERN RECOGNITION, 1992, 25 (10) :1075-1096
[16]  
CHEN S, 1990, 10TH P INT C PATT RE
[17]   3-D SHAPE RECOVERY USING DISTRIBUTED ASPECT MATCHING [J].
DICKINSON, SJ ;
PENTLAND, AP ;
ROSENFELD, A .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1992, 14 (02) :174-198
[18]  
EDELMAN S, 1989, MIT1138 AI LAB MEM T
[19]   COMPUTING THE PERSPECTIVE PROJECTION ASPECT GRAPH OF SOLIDS OF REVOLUTION [J].
EGGERT, D ;
BOWYER, K .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1993, 15 (02) :109-128
[20]  
FEKETE G, 1984, 1984 P IEEE WORKSH C, P198