Fast-learning VIEWNET architectures for recognizing three-dimensional objects from multiple two-dimensional views

被引:51
作者
Bradski, G [1 ]
Grossberg, S [1 ]
机构
[1] BOSTON UNIV,CTR ADAPT SYST,BOSTON,MA 02215
基金
美国国家科学基金会;
关键词
pattern recognition; learning; neural network; ARTMAP; inferotemporal cortex;
D O I
10.1016/0893-6080(95)00053-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recognition of three-dimensional (3-D) objects from sequences of their two-dimensional (2-D) views is modeled by a family of self-organizing neural architectures, called VIEWNET, that use View Information Encoded With NETworks. VIEWNET incorporates a preprocessor that generates a compressed but 2-D invariant representation of an image, a supervised incremental learning system that classifies the preprocessed representations into 2-D view categories whose outputs are combined into 3-D invariant object categories, and a working memory that makes a 3-D object prediction by accumulating evidence from 3-D object category nodes as multiple 2-D views are experienced. The simplest VIEWNET achieves high recognition scores without the need to explicitly code the temporal order of 2-D views in working memory. Working memories are also discussed that save memory resources by implicitly coding temporal order in terms of the relative activity of 2-D view category nodes, rather than as explicit 2-D view transitions. Variants of the VIEWNET architecture may be used for scene understanding by using a preprocessor and classifier that can determine both what objects are in a scene and where they are located. The present VIEWNET preprocessor includes the CORT-X 2 filter, which discounts the illuminant, regularizes and completes figural boundaries, and suppresses image noise. This boundary segmentation is rendered invariant under 2-D translation, rotation, and dilation by use of a log-polar transform. The invariant spectra undergo Gaussian coarse coding to further reduce noise and 3-D foreshortening effects, and to increase generalization. These compressed codes are input into the classifier, a supervised learning system based on the fuzzy ARTMAP algorithm. Fuzzy ARTMAP learns 2-D view categories that are invariant under 2-D image translation, rotation, and dilation as well as 3-D image transformations that do not cause a predictive error. Evidence from sequences of 2-D view categories converges at 3-D object nodes that generate a response invariant under changes of 2-D view. These 3-D object nodes input to a working memory that accumulates evidence over time to improve object recognition. In the simplest working memory, each occurrence (nonoccurrence) of a 2-D view category increases (decreases) the corresponding node's activity in working memory. The maximally active node is used to predict the 3-D object. Recognition is studied with noisy and clean images using slow and fast learning. Slow learning at the fuzzy ARTMAP map field is adapted to learn the conditional probability of the 3-D object given the selected 2-D view category. VIEWNET is demonstrated on an MIT Lincoln Laboratory database of 128x128 2-D views of aircraft with and without additive noire. A recognition rate of up to 90% is achieved with one 2-D view and of up to 98.5% correct with three 2-D views. The properties of 2-D view and 3-D object category nodes are compared with those of cells in monkey inferotemporal cortex.
引用
收藏
页码:1053 / 1080
页数:28
相关论文
共 55 条
[1]  
ARRINGTON K, 1994, VISION RES, V24, P3241
[2]  
ASFOUR Y, 1994, THESIS BOSTON U BOST
[3]  
ASFOUR Y, 1993, 3RD P INT C IND FUZZ, P155
[4]  
ASFOUR YR, 1993, P WORLD C NEUR NETW, V2, P210
[5]  
BOWYER K, 1989, MAY P IM UND WORKSH, P831
[6]  
BRADSKI G, 1994, BIOL CYBERN, V71, P469, DOI 10.1007/BF00198465
[7]   WORKING MEMORY NETWORKS FOR LEARNING TEMPORAL-ORDER WITH APPLICATION TO 3-DIMENSIONAL VISUAL OBJECT RECOGNITION [J].
BRADSKI, G ;
CARPENTER, GA ;
GROSSBERG, S .
NEURAL COMPUTATION, 1992, 4 (02) :270-286
[8]  
CARPENTER G, 1992, P IJCNN 92, V1, P794
[9]  
CARPENTER G, 1993, CASCNSTR93046 BOST U
[10]  
CARPENTER G, 1993, CASCNSTR93014 BOST U