Unsupervised clustering of ambulatory audio and video

被引：36

作者：

Clarkson, B ^{[1
]}

Pentland, A ^{[1
]}

机构：

[1] MIT, Media Lab, Perceptual Comp, Cambridge, MA 02139 USA

来源：

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI | 1999年

关键词：

D O I：

10.1109/ICASSP.1999.757481

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A truly personal and reactive computer system should have access to the same information as its user, including the ambient sights and sounds. To this end, we have developed a system for extracting events and scenes from natural audio/visual input. We find our system can (without any prior labeling of data) cluster the audio/visual data into events, such as passing through doors and crossing the street. Also, we hierarchically cluster these events into scenes and get clusters that correlate with visiting the supermarket, or walking down a busy street.

引用

页码：3037 / 3040

页数：4