Extraction and representation of prosodic features for language and speaker recognition

被引：121

作者：

Mary, Leena ^{[1
]}

Yegnanarayana, B. ^{[2
]}

机构：

[1] Indian Inst Technol, Dept Comp Sci & Engn, Speech & Vis Lab, Madras 600036, Tamil Nadu, India

[2] Int Inst Informat Technol, Dept Comp Sci & Engn, Hyderabad 500032, Andhra Pradesh, India

来源：

SPEECH COMMUNICATION | 2008年 / 50卷 / 10期

关键词：

Prosody; Vowel onset point; Intonation; Stress; Rhythm; Language recognition; Speaker recognition; Multilayer feedforward neural network; Autoassociative neural network;

D O I：

10.1016/j.specom.2008.04.010

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we propose a new approach for extracting and representing prosodic features directly from the speech signal. We hypothesize that prosody is linked to linguistic units such as syllables, and it is manifested in terms of changes in measurable parameters such as fundamental frequency (F-0), duration and energy. In this work, syllable-like unit is chosen as the basic unit for representing the prosodic characteristics. Approximate segmentation of continuous speech into syllable-like units is obtained by locating the vowel onset points (VOP) automatically. The knowledge of the VOPs serve as reference for extracting prosodic features from the speech signal. Quantitative parameters are used to represent F-0 and energy contour in each region between two consecutive VOPs. Prosodic features extracted using this approach may be useful in applications such as recognition of language or speaker, where explicit phoneme/syllable boundaries are not easily available. The effectiveness of the derived prosodic features for language and speaker recognition is evaluated in the case of NIST language recognition evaluation 2003 and the extended data task of NIST speaker recognition evaluation 2003, respectively. (c) 2008 Elsevier B.V. All rights reserved.

引用

页码：782 / 796

页数：15

共 46 条

[1]

Abercrombie David., 1967, ELEMENTS GEN PHONETI

[2]

Adami AG, 2003, INT CONF ACOUST SPEE, P788

[3]

ADAMI AG, 2003, P 8 EUR C SPEECH COM, P841

[4] EPOCH EXTRACTION FROM LINEAR PREDICTION RESIDUAL FOR IDENTIFICATION OF CLOSED GLOTTIS INTERVAL [J].

ANANTHAPADMANABHA, TV ;

YEGNANARAYANA, B .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (04) :309-319

[5]

[Anonymous], 1983, PROSODY MODELS MEASU

[6]

Ashby M., 2005, Introducing Phonetic Science

[7] On the phonetics and phonology of "segmental anchoring" of F0:: evidence from German [J].

Atterer, M ;

Ladd, DR .

JOURNAL OF PHONETICS, 2004, 32 (02) :177-197

[8] Score normalization for text-independent speaker verification systems [J].

Auckenthaler, R ;

Carey, M ;

Lloyd-Thomas, H .

DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :42-54

[9]

CHAITANYA M, 2005, THESIS INDIAN I TECH

[10]

CUMMINS F, 1999, IDSIA0799

← 1 2 3 4 5 →