Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion

被引:59
作者
Doi, Hironori [1 ]
Toda, Tomoki [1 ]
Nakamura, Keigo [1 ]
Saruwatari, Hiroshi [1 ]
Shikano, Kiyohiro [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara 6300192, Japan
关键词
Alaryngeal speech; eigenvoice conversion; laryngectomees; speech enhancement; voice conversion; FREQUENCY CEPSTRAL COEFFICIENTS; STATISTICAL VOICE CONVERSION; PREDICTION;
D O I
10.1109/TASLP.2013.2286917
中图分类号
O42 [声学];
学科分类号
070206 [声学];
摘要
In this paper, we present novel speaking- aid systems based on one-to- many eigenvoice conversion (EVC) to enhance three types of alaryngeal speech: esophageal speech, electrolaryngeal speech, and body-conducted silent electrolaryngeal speech. Although alaryngeal speech allows laryngectomees to utter speech sounds, it suffers from the lack of speech quality and speaker individuality. To improve the speech quality of alaryngeal speech, alaryngeal-speech- to-speech (AL-to- Speech) methods based on statistical voice conversion have been proposed. In this paper, one-to- many EVC capable of flexibly controlling the converted voice quality by adapting the conversion model to given target natural voices is further implemented for the AL-to-Speech methods to effectively recover speaker individuality of each type of alaryngeal speech. These proposed systems are compared with each other from various perspectives. The experimental results demonstrate that our proposed systems are capable of effectively addressing the issues of alaryngeal speech, e. g., yielding significant improvements in speech quality of each type of alaryngeal speech.
引用
收藏
页码:172 / 183
页数:12
相关论文
共 29 条
[1]
Abe M., 1990, Journal of the Acoustical Society of Japan (E), V11, P71, DOI 10.1250/ast.11.71
[2]
Aguilar-Torres G., 2006, P 16 IEEE INT C EL C, P30
[3]
Anastasakos T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1137, DOI 10.1109/ICSLP.1996.607807
[4]
Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures [J].
Darch, Jonathan ;
Milner, Ben ;
Vaseghi, Saeed .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 124 (06) :3989-4000
[5]
Doi H, 2011, INT CONF ACOUST SPEE, P5136
[6]
Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models [J].
Doi, Hironori ;
Nakamura, Keigo ;
Toda, Tomoki ;
Saruwatari, Hiroshi ;
Shikano, Kiyohiro .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09) :2472-2482
[7]
Silent-speech enhancement using body-conducted vocal-tract resonance signals [J].
Hirahara, Tatsuya ;
Otani, Makoto ;
Shimizu, Shota ;
Toda, Tomoki ;
Nakamura, Keigo ;
Nakajima, Yoshitaka ;
Shikano, Kiyohiro .
SPEECH COMMUNICATION, 2010, 52 (04) :301-313
[8]
Hisada A., 2002, INT C DIS VIRT REAL, P39
[9]
Kain A, 1998, INT CONF ACOUST SPEE, P285, DOI 10.1109/ICASSP.1998.674423
[10]
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].
Kawahara, H ;
Masuda-Katsuse, I ;
de Cheveigné, A .
SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207