Fast-learning VIEWNET architectures for recognizing three-dimensional objects from multiple two-dimensional views

被引：51

作者：

Bradski, G ^{[1
]}

Grossberg, S ^{[1
]}

机构：

[1] BOSTON UNIV,CTR ADAPT SYST,BOSTON,MA 02215

来源：

NEURAL NETWORKS | 1995年 / 8卷 / 7-8期

基金：

美国国家科学基金会;

关键词：

pattern recognition; learning; neural network; ARTMAP; inferotemporal cortex;

D O I：

10.1016/0893-6080(95)00053-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The recognition of three-dimensional (3-D) objects from sequences of their two-dimensional (2-D) views is modeled by a family of self-organizing neural architectures, called VIEWNET, that use View Information Encoded With NETworks. VIEWNET incorporates a preprocessor that generates a compressed but 2-D invariant representation of an image, a supervised incremental learning system that classifies the preprocessed representations into 2-D view categories whose outputs are combined into 3-D invariant object categories, and a working memory that makes a 3-D object prediction by accumulating evidence from 3-D object category nodes as multiple 2-D views are experienced. The simplest VIEWNET achieves high recognition scores without the need to explicitly code the temporal order of 2-D views in working memory. Working memories are also discussed that save memory resources by implicitly coding temporal order in terms of the relative activity of 2-D view category nodes, rather than as explicit 2-D view transitions. Variants of the VIEWNET architecture may be used for scene understanding by using a preprocessor and classifier that can determine both what objects are in a scene and where they are located. The present VIEWNET preprocessor includes the CORT-X 2 filter, which discounts the illuminant, regularizes and completes figural boundaries, and suppresses image noise. This boundary segmentation is rendered invariant under 2-D translation, rotation, and dilation by use of a log-polar transform. The invariant spectra undergo Gaussian coarse coding to further reduce noise and 3-D foreshortening effects, and to increase generalization. These compressed codes are input into the classifier, a supervised learning system based on the fuzzy ARTMAP algorithm. Fuzzy ARTMAP learns 2-D view categories that are invariant under 2-D image translation, rotation, and dilation as well as 3-D image transformations that do not cause a predictive error. Evidence from sequences of 2-D view categories converges at 3-D object nodes that generate a response invariant under changes of 2-D view. These 3-D object nodes input to a working memory that accumulates evidence over time to improve object recognition. In the simplest working memory, each occurrence (nonoccurrence) of a 2-D view category increases (decreases) the corresponding node's activity in working memory. The maximally active node is used to predict the 3-D object. Recognition is studied with noisy and clean images using slow and fast learning. Slow learning at the fuzzy ARTMAP map field is adapted to learn the conditional probability of the 3-D object given the selected 2-D view category. VIEWNET is demonstrated on an MIT Lincoln Laboratory database of 128x128 2-D views of aircraft with and without additive noire. A recognition rate of up to 90% is achieved with one 2-D view and of up to 98.5% correct with three 2-D views. The properties of 2-D view and 3-D object category nodes are compared with those of cells in monkey inferotemporal cortex.

引用

页码：1053 / 1080

页数：28

共 55 条

[1]

ARRINGTON K, 1994, VISION RES, V24, P3241

[2]

ASFOUR Y, 1994, THESIS BOSTON U BOST

[3]

ASFOUR Y, 1993, 3RD P INT C IND FUZZ, P155

[4]

ASFOUR YR, 1993, P WORLD C NEUR NETW, V2, P210

[5]

BOWYER K, 1989, MAY P IM UND WORKSH, P831

[6]

BRADSKI G, 1994, BIOL CYBERN, V71, P469, DOI 10.1007/BF00198465

[7] WORKING MEMORY NETWORKS FOR LEARNING TEMPORAL-ORDER WITH APPLICATION TO 3-DIMENSIONAL VISUAL OBJECT RECOGNITION [J].

BRADSKI, G ;

CARPENTER, GA ;

GROSSBERG, S .

NEURAL COMPUTATION, 1992, 4 (02) :270-286

[8]

CARPENTER G, 1992, P IJCNN 92, V1, P794

[9]

CARPENTER G, 1993, CASCNSTR93046 BOST U

[10]

CARPENTER G, 1993, CASCNSTR93014 BOST U

← 1 2 3 4 5 6 →