WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications

被引:873
作者
Morise, Masanori [1 ]
Yokomori, Fumiya [2 ]
Ozawa, Kenji [1 ]
机构
[1] Univ Yamanashi, Interdisciplinary Grad Sch, Kofu, Yamanashi 4008511, Japan
[2] Univ Yamanashi, Grad Sch Med & Engn Sci, Dept Educ, Kofu, Yamanashi 4008511, Japan
关键词
speech analysis; speech synthesis; vocoder; sound quality; real-time processing; TANDEM-STRAIGHT; PHASE; REPRESENTATION; ESTIMATOR; SPECTRUM; SIGNALS; F0;
D O I
10.1587/transinf.2015EDP7457
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
A vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of real-time applications using speech. Speech analysis, manipulation, and synthesis on the basis of vocoders are used in various kinds of speech research. Although several high-quality speech synthesis systems have been developed, real-time processing has been difficult with them because of their high computational costs. This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis algorithm proposed in our previous research. The effectiveness of the system was evaluated by comparing its output with against natural speech including consonants. Its processing speed was also compared with those of conventional systems. The results showed that WORLD was superior to the other systems in terms of both sound quality and processing speed. In particular, it was over ten times faster than the conventional systems, and the real time factor (RTF) indicated that it was fast enough for real-time processing.
引用
收藏
页码:1877 / 1884
页数:8
相关论文
共 38 条
[1]
Agiomyrgiannakis Y, 2015, INT CONF ACOUST SPEE, P4230, DOI 10.1109/ICASSP.2015.7178768
[2]
[Anonymous], 1995, PROC EUROPEAN C SPEE
[3]
[Anonymous], 2003, METHOD SUBJECTIVE AS
[4]
[Anonymous], 2005, P INT 2005 LISB PORT
[5]
[Anonymous], 1983, PITCH DETERMINATION, DOI DOI 10.1007/978-3-642-81926-1
[6]
SPEECH ANALYSIS AND SYNTHESIS BY LINEAR PREDICTION OF SPEECH WAVE [J].
ATAL, BS ;
HANAUER, SL .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 50 (02) :637-+
[7]
Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation [J].
Banno, Hideki ;
Hata, Hiroaki ;
Morise, Masanori ;
Takahashi, Toru ;
Irino, Toshio ;
Kawahara, Hideki .
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2007, 28 (03) :140-146
[8]
A sawtooth waveform inspired pitch estimator for speech and music [J].
Camacho, Arturo ;
Harris, John G. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 124 (03) :1638-1652
[9]
YIN, a fundamental frequency estimator for speech and music [J].
de Cheveigné, A ;
Kawahara, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 111 (04) :1917-1930
[10]
Remaking speech [J].
Dudley, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1939, 11 (02) :169-177