Locating singing voice segments within music signals

被引:59
作者
Berenzweig, AL [1 ]
Ellis, DPW [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
来源
PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS | 2001年
关键词
D O I
10.1109/ASPAA.2001.969557
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A sung vocal line is the prominent feature of much popular music. It would be useful to reliably locate the portions of a musical track during which the vocals are present, both as a 'signature' of the piece and as a precursor to automatic recognition of lyrics. Here, we approach this problem by using the acoustic classifier of a speech recognizer as a detector for speech-like sounds. Although singing (including a musical background) is a relatively poor match to an acoustic model trained on normal speech, we propose various statistics of the classifier's output in order to discriminate singing from instrumental accompaniment. A simple HMM allows us to find a best labeling sequence for this uncertain data. On a test set of forty 15 second excerpts of randomly-selected music, our classifier achieved around 80% classification accuracy at the frame level. The utility of different features, and our plans for eventual lyrics recognition, are discussed.
引用
收藏
页码:119 / 122
页数:4
相关论文
共 7 条
[1]  
BARKER J, 2000, P ICSLP BEIJ OCT
[2]  
CHOU W, 2001, P ICASSP SALT LAK MA
[3]  
Cook G, 1999, P DARPA BROADC NEWS
[4]  
HAIN T, 1998, P DARPA BROADC NEWS
[5]  
HERMANSKY H, 2000, P ICASSP IST JUN
[6]  
SCHEIER E, 1997, P ICASSP MUN APR
[7]  
WILLIAMS G, 1999, P EUR BUD SEPT