Separating style and content with bilinear models

被引:574
作者
Tenenbaum, JB
Freeman, WT
机构
[1] MIT, Dept Brain & Cognit Sci, Cambridge, MA 02139 USA
[2] Mitsubishi Elect Res Lab, Cambridge, MA 02139 USA
关键词
D O I
10.1162/089976600300015349
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Perceptual systems routinely separate "content" from "style," classifying familiar words spoken in an unfamiliar accent, identifying a font or handwriting style across letters, or recognizing a familiar face or object seen under unfamiliar viewing conditions. Yet a general and tractable computational model of this ability to untangle the underlying factors of perceptual observations remains elusive (Hofstadter, 1985). Existing factor models (Mardia, Kent, & Bibby, 1979; Hinton & Zemel, 1994; Ghahramani, 1995; Bell & Sejnowski, 1995; Hinton, Dayan, Prey, & Neal, 1995; Dayan, Hinton, Neal, & Zemel, 1995; Hinton & Ghahramani, 1997) are either insufficiently rich to capture the complex interactions of perceptually meaningful factors such as phoneme and speaker accent or letter and font, or do not allow efficient learning algorithms, We present a general framework for learning to solve two-factor tasks using bilinear models, which provide sufficiently expressive representations of factor interactions but can nonetheless be fit to data using efficient algorithms based on the singular value decomposition and expectation-maximization. We report promising results on three different tasks in three different perceptual domains: spoken vowel classification with a benchmark multispeaker database, extrapolation of fonts to unseen letters, and translation of faces to novel illuminants.
引用
收藏
页码:1247 / 1283
页数:37
相关论文
共 56 条
  • [11] BRAINARD D, 1991, COMPUTATIONAL MODELS, pCH13
  • [12] Caruana R, 1998, LEARNING TO LEARN, P95, DOI 10.1007/978-1-4615-5529-2_5
  • [13] THE HELMHOLTZ MACHINE
    DAYAN, P
    HINTON, GE
    NEAL, RM
    ZEMEL, RS
    [J]. NEURAL COMPUTATION, 1995, 7 (05) : 889 - 904
  • [14] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [15] COLOR CONSTANCY - SURFACE COLOR FROM CHANGING ILLUMINATION
    DZMURA, M
    [J]. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1992, 9 (03): : 490 - 493
  • [16] Learning bilinear models for two-factor problems in vision
    Freeman, WT
    Tenenbaum, JB
    [J]. 1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, : 554 - 560
  • [17] GHAHRAMANI Z, 1995, ADV NEURAL INFORMATI, V7, P617
  • [18] CONNECTIONIST GENERALIZATION FOR PRODUCTION - AN EXAMPLE FROM GRIDFONT
    GREBERT, I
    STORK, DG
    KEESING, R
    MIMS, S
    [J]. NEURAL NETWORKS, 1992, 5 (04) : 699 - 710
  • [19] HALLINAN PW, 1994, 1994 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, P995, DOI 10.1109/CVPR.1994.323941
  • [20] Hart P.E., 1973, Pattern recognition and scene analysis