A method for generating natural-sounding speech stimuli for cognitive brain research

被引:117
作者
Alku, P
Tiitinen, H
Näätänen, R
机构
[1] Aalto Univ, Acoust Lab, FIN-02015 Helsinki, Finland
[2] Univ Helsinki, Dept Psychol, Cognit Brain Res Unit, Helsinki, Finland
关键词
speech production; inverse filtering; speech synthesis; speech perception; auditory discrimination; mismatch negativity;
D O I
10.1016/S1388-2457(99)00088-7
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Objective: In response to the rapidly increasing interest in using human voice in cognitive brain research, a new method, semisynthetic speech generation (SSG), is presented for generation of speech stimuli. Methods: The method synthesizes speech stimuli as a combination of purely artificial processes and processes that originate from the natural human speech production mechanism. SSG first estimates the source of speech, the glottal flow, from a natural utterance using an inverse filtering technique. The glottal flow obtained is then used as an excitation to an artificial digital filter that models the formant structure of speech. Results: SSG is superior to commercial voice synthesizers because it yields speech stimuli of a highly natural quality due to the contribution of the man-originating glottal excitation. Conclusion: The artificial modelling of the vocal tract enables one to adjust the formant frequencies of the stimuli as desired, thus making SSG suitable for cognitive experiments using speech sounds as stimuli. (C) 1999 Elsevier Science Ireland Ltd. All rights reserved.
引用
收藏
页码:1329 / 1333
页数:5
相关论文
共 20 条
  • [1] GLOTTAL WAVE ANALYSIS WITH PITCH SYNCHRONOUS ITERATIVE ADAPTIVE INVERSE FILTERING
    ALKU, P
    [J]. SPEECH COMMUNICATION, 1992, 11 (2-3) : 109 - 118
  • [2] Development of language-specific phoneme representations in the infant brain
    Cheour, M
    Ceponiene, R
    Lehtokoski, A
    Luuk, A
    Allik, J
    Alho, K
    Naatanen, R
    [J]. NATURE NEUROSCIENCE, 1998, 1 (05) : 351 - 353
  • [3] Fant G., 1960, ACOUSTIC THEORY SPEE
  • [4] Flanagan J. L., 1972, Speech Analysis Synthesis and Perception
  • [5] ANALYSIS OF DIGITAL AND ANALOG FORMANT SYNTHESIZERS
    GOLD, B
    RABINER, LR
    [J]. IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1968, AU16 (01): : 81 - &
  • [6] Karjalainen M., 1990, IEEE ASSP Magazine, V7, P21, DOI 10.1109/53.53030
  • [7] MISMATCH NEGATIVITY IN SCHOOL-AGE-CHILDREN TO SPEECH STIMULI THAT ARE JUST PERCEPTIBLY DIFFERENT
    KRAUS, N
    MCGEE, T
    MICCO, A
    SHARMA, A
    CARRELL, T
    NICOL, T
    [J]. ELECTROENCEPHALOGRAPHY AND CLINICAL NEUROPHYSIOLOGY, 1993, 88 (02): : 123 - 130
  • [8] KURIKI S, 1995, EXP BRAIN RES, V104, P144
  • [9] Maisch B, 1995, EUR HEART J, V16, P1
  • [10] Language-specific phoneme representations revealed by electric and magnetic brain responses
    Naatanen, R
    Lehtokoski, A
    Lennes, M
    Cheour, M
    Huotilainen, M
    Iivonen, A
    Vainio, M
    Alku, P
    Ilmoniemi, RJ
    Luuk, A
    Allik, J
    Sinkkonen, J
    Alho, K
    [J]. NATURE, 1997, 385 (6615) : 432 - 434