A TEMPERING APPROACH FOR ITAKURA-SAITO NON-NEGATIVE MATRIX FACTORIZATION. WITH APPLICATION TO MUSIC TRANSCRIPTION

被引:13
作者
Bertin, Nancy [1 ]
Fevotte, Cedric [1 ]
Badeau, Roland [1 ]
机构
[1] CNRS LTCI TELECOM ParisTech ENST, F-75634 Paris 13, France
来源
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年
关键词
Non-negative matrix factorization (NMF); Itakura-Saito (IS) divergence; beta divergence; music transcription;
D O I
10.1109/ICASSP.2009.4959891
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we are interested in non-negative matrix factorization (NMF) with the Itakura-Saito (IS) divergence. Previous work has demonstrated the relevance of this cost function for the decomposition of audio power spectrograms. This is in particular due to its scale invariance, which makes it more robust to the wide dynamics of audio, a property which is not shared by other popular costs such as the Euclidean distance or the generalized Kulback-Leibler (KL) divergence. However, while the latter two cost functions are convex, the IS divergence is not, which makes it more prone to convergence to irrelevant local minima, as observed empirically. Thus, the aim of this paper is to propose a tempering scheme that favors convergence of IS-NMF to global minima. Our algorithm is based on NMF with the beta-divergence, where the shape parameter beta acts as a temperature parameter. Results on both synthetical and music data (in a transcription context) show the relevance of our approach.
引用
收藏
页码:1545 / 1548
页数:4
相关论文
共 9 条
  • [1] BERTIN N, 2008, P AC 08 JASA
  • [2] Bertin N., 2007, P INT C AC SPEECH SI
  • [3] Cichocki A, 2006, LECT NOTES COMPUT SC, V3889, P32
  • [4] Nonnegative matrix and tensor factorization
    Cichocki, Andrzej
    Zdunek, Rafal
    Amari, Shun-Ichi
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2008, 25 (01) : 142 - 145
  • [5] Eguchi S., 2001, Robustifying maximum likelihood estimation
  • [6] FEVOTTE C, 2008, NEURAL COMP IN PRESS
  • [7] KOMPASS R, 2003, NEURAL COMPUT, V19, P780
  • [8] Lee DD, 2001, ADV NEUR IN, V13, P556
  • [9] Smaragdis P, 2003, 2003 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS PROCEEDINGS, P177, DOI 10.1109/ASPAA.2003.1285860