Layered representations for learning and inferring office activity from multiple sensory channels

被引:169
作者
Oliver, N [1 ]
Garg, A
Horvitz, E
机构
[1] Microsoft Res, Adapt Syst & Interact, Redmond, WA USA
[2] Univ Illinois, Dept Comp Sci, Champaign, IL USA
关键词
office awareness; office activity recognition; multi-modal systems; human behavior understanding; hidden Markov models;
D O I
10.1016/j.cviu.2004.02.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present the use of layered probabilistic representations for modeling human activities, and describe how we use the representation to do sensing, learning, and inference at multiple levels of temporal granularity and abstraction and from heterogeneous data sources. The approach centers on the use of a cascade of Hidden Markov Models named Layered Hidden Markov Models (LHMMs) to diagnose states of a user's activity based on real-time streams of evidence from video, audio, and computer (keyboard and mouse) interactions. We couple these LHMMs with an expected utility analysis that considers the cost of misclassification. We describe the representation, present an implementation, and report on experiments with our layered architecture in a real-time office-awareness setting. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:163 / 180
页数:18
相关论文
共 31 条
[1]   Adaptive probabilistic networks with hidden variables [J].
Binder, J ;
Koller, D ;
Russell, S ;
Kanazawa, K .
MACHINE LEARNING, 1997, 29 (2-3) :213-244
[2]   Discovery and segmentation of activities in video [J].
Brand, M ;
Kettnaker, V .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (08) :844-851
[3]   Coupled hidden Markov models for complex action recognition [J].
Brand, M ;
Oliver, N ;
Pentland, A .
1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, :994-999
[4]   A practical methodology for speech source localization with microphone arrays [J].
Brandstein, MS ;
Silverman, HF .
COMPUTER SPEECH AND LANGUAGE, 1997, 11 (02) :91-126
[5]  
BUXTON H, 1995, P WORKSH CONT BAS VI, P111
[6]   CHO cell growth and recombinant interferon-gamma production: Effects of BSA, Pluronic and lipids [J].
Castro, PML ;
Ison, AP ;
Hayter, PM ;
Bull, AT .
CYTOTECHNOLOGY, 1996, 19 (01) :27-36
[7]   Unsupervised clustering of ambulatory audio and video [J].
Clarkson, B ;
Pentland, A .
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :3037-3040
[8]   The representation and recognition of human movement using temporal templates [J].
Davis, JW ;
Bobick, AF .
1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, :928-934
[9]  
Deller J.R., 1993, Discrete-time processing of speech signals
[10]   Building qualitative event models automatically from visual input [J].
Fernyhough, J ;
Cohn, AG ;
Hogg, DC .
SIXTH INTERNATIONAL CONFERENCE ON COMPUTER VISION, 1998, :350-355