Unsupervised clustering of ambulatory audio and video

被引:36
作者
Clarkson, B [1 ]
Pentland, A [1 ]
机构
[1] MIT, Media Lab, Perceptual Comp, Cambridge, MA 02139 USA
来源
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI | 1999年
关键词
D O I
10.1109/ICASSP.1999.757481
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A truly personal and reactive computer system should have access to the same information as its user, including the ambient sights and sounds. To this end, we have developed a system for extracting events and scenes from natural audio/visual input. We find our system can (without any prior labeling of data) cluster the audio/visual data into events, such as passing through doors and crossing the street. Also, we hierarchically cluster these events into scenes and get clusters that correlate with visiting the supermarket, or walking down a busy street.
引用
收藏
页码:3037 / 3040
页数:4
相关论文
共 8 条
[1]  
Albert S. Bregman, 1990, AUDITORY SCENE ANAL, P411, DOI [DOI 10.1121/1.408434, DOI 10.7551/MITPRESS/1486.001.0001]
[2]  
[Anonymous], 1989, P IEEE
[3]  
BROWN GJ, 1992, THESIS U SHEFFIELD
[4]  
FEITEN B, 1994, COMPUTER MUSIC J
[5]  
LIU W, 1998, J VLSI SIGNAL PROCES
[6]  
PFEIFFER S, 1997, AUTOMATIC AUDIO CONT
[7]  
SIEWIOREK D, 1997, 1 INT S WEAR COMP
[8]  
STARNER T, 1998, 2 INT S WEAR COMP OC