DIPHONE SPEECH SYNTHESIS

被引：6

作者：

OSHAUGHNESSY, D

BARBEAU, L

BERNARDI, D

ARCHAMBAULT, D

机构：

[1] INRS-Telecommunications, Nuns, Island, Que, Can, INRS-Telecommunications, Nuns Island, Que, Can

来源：

SPEECH COMMUNICATION | 1988年 / 7卷 / 01期

关键词：

WAVEFORM ANALYSIS;

D O I：

10.1016/0167-6393(88)90021-0

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Text-to-speech synthesis requires two steps: linguistic processing to convert text into phonemes and intonation parameters and simulation of speech production (to generate the speech waveform). We review different methods for the second task, emphasizing the advantages and disadvantages of the linear predictive (LPC) diphone approach. Diphones require more memory to represent all possible spectral transitions between pairs of phonemes, but they directly capture many of the coarticulation effects that must otherwise be modeled in phonemic synthesis. Relatively simple interpolation is allowed due to the similarity of spectra at diphone boundaries.

引用

页码：55 / 65

页数：11

共 22 条

[1] SYNTHESIS OF SPEECH FROM UNRESTRICTED TEXT [J].

ALLEN, J .

PROCEEDINGS OF THE IEEE, 1976, 64 (04) :433-442

[2]

[Anonymous], P INT C AC SPEECH SI

[3]

BROWMAN C, 1980, P IEEE INT C ASSP, P561

[4]

Caspers B., 1987, Proceedings: ICASSP 87. 1987 International Conference on Acoustics, Speech, and Signal Processing (Cat. No.87CH2396-0), P2388

[5]

DETTWEILER H, 1985, ACUSTICA, V57, P268

[6]

DETTWEILER H, P IEEE INT C ASSP, P752

[7]

Dixon N.R., 1968, IEEE T AUDIO ELECTRO, V16, P40

[8]

ELSENDOORN B, 1984, IPO ANN PROG REP, V19, P32

[9] TEMPORAL ORGANIZATION OF ARTICULATORY MOVEMENTS AS A MULTIDIMENSIONAL PHRASAL STRUCTURE [J].

FUJIMURA, O .

PHONETICA, 1981, 38 (1-3) :66-83

[10]

FUJIMURA O, 1978, SYLLABLES SEGMENTS, P00107

← 1 2 3 →