AUTOMATIC-INDEXING AND CONTENT-BASED RETRIEVAL OF CAPTIONED IMAGES

被引:71
作者
SRIHARI, RK
机构
[1] State University of New York, Buffalo
关键词
Information retrieval systems;
D O I
10.1109/2.410153
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This research explores the interaction of textual and photographic information in an integrated text/image database environment developed at the Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York, Buffalo. The idea is to extract information from a newspaper photo caption that can be used for retrieving the picture and for identifying the people shown. A multistage system, called Fiction, uses spatial and characteristic constraints derived from the caption in labeling face candidates generated by a face locator. Several other vision systems employ the idea of top-down control in picture understanding by providing the general context; this system carries the notion one step further, exploiting not only general context but also pic ture-specific context. The author gives several examples showing how information from both text and images can be used in computing the similarity between a given query and an image in the database to satisfy focus-of-attention queries. Although Fiction represents only a preliminary foray into truly integrated text/image content-based retrieval, it shows that additional discriminatory capabilities can be obtained by combining the two sources of information. Much work remains, however, both in improving the language processing capabilities and in face location and characterization.
引用
收藏
页码:49 / 56
页数:8
相关论文
共 12 条
[1]  
Beckwith R., 1991, LEXICAL ACQUISITION, P211, DOI [10.4324/9781315785387-12, DOI 10.4324/9781315785387-12]
[2]  
CHAKRAVARTHY A, 1994, 15TH P ANN C COMP LI
[3]   HUMAN AND MACHINE RECOGNITION OF FACES - A SURVEY [J].
CHELLAPPA, R ;
WILSON, CL ;
SIROHEY, S .
PROCEEDINGS OF THE IEEE, 1995, 83 (05) :705-740
[4]  
CUTTING D, 1993, PRACTICAL PART SPEEC
[5]  
GOVINDARAJU V, 1992, P AAAI 92 SAN JOSE C, P350
[6]  
Mani I., 1993, P WORKSH ACQ LEX KNO, P44
[7]  
SRIHARI RK, 1995, INTEGRATION NATURAL, V8, P409
[8]  
SRIHARI RK, 1995, P AAAI 94 MENL PARK, P793
[9]   CONTEXT-BASED VISION - RECOGNIZING OBJECTS USING INFORMATION FROM BOTH 2-D AND 3-D IMAGERY [J].
STRAT, TM ;
FISCHLER, MA .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1991, 13 (10) :1050-1065
[10]  
Tomita M., 1987, Computational Linguistics, V13, P31