A generative model for music transcription

被引:69
作者
Cemgil, AT [1 ]
Kappen, HJ
Barber, D
机构
[1] SNN, Stichting Neurale Netwerken, NL-6525 EZ Nijmegen, Netherlands
[2] Univ Amsterdam, Inst Informat, NL-1098 SJ Amsterdam, Netherlands
[3] Radboud Univ Nijmegen, NL-6525 EZ Nijmegen, Netherlands
[4] IDIAP, CH-1920 Martigny, Switzerland
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 02期
关键词
Bayesian signal processing; music transcription; polyphonic pitch tracking; switching Kalman filters;
D O I
10.1109/TSA.2005.852985
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a graphical model for polyphonic music transcription. Our model, formulated as a dynamical Bayesian network, embodies a transparent and computationally tractable approach to this acoustic analysis problem. An advantage of our approach is that it places emphasis on explicitly modeling the sound generation procedure. It provides a clear framework in which both high level (cognitive) prior information on music structure can be coupled with low level (acoustic physical) information in a principled manner to perform the analysis. The model is a special case of the, generally intractable, switching Kalman filter model. Where possible, we derive, exact polynomial time inference procedures, and otherwise efficient approximations. We argue that our generative model based approach is computationally feasible for many music applications and is readily extensible to more general auditory scene analysis scenarios.
引用
收藏
页码:679 / 694
页数:16
相关论文
共 50 条
[1]  
Albert S. Bregman, 1990, AUDITORY SCENE ANAL, P411, DOI [DOI 10.1121/1.408434, DOI 10.7551/MITPRESS/1486.001.0001]
[2]  
[Anonymous], 1999, DISSERTATION
[3]  
[Anonymous], 1983, PITCH DETERMINATION, DOI DOI 10.1007/978-3-642-81926-1
[4]  
[Anonymous], THESIS STANFORD U ST
[5]   Independent factor analysis [J].
Attias, H .
NEURAL COMPUTATION, 1999, 11 (04) :803-851
[6]  
Bar-Shalom Yaakov., 1993, ESTIMATION TRACKING
[7]   COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].
BROWN, GJ ;
COOKE, M .
COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336
[8]   Monte Carlo methods for tempo tracking and rhythm quantization [J].
Cemgil, AT ;
Kappen, B .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2003, 18 :45-81
[9]  
CEMGIL AT, 2003, P IEEE WASPAA NEW PA
[10]  
DAVY M, 2003, BAYESIAN STAT, V7