SIGNAL MODELING TECHNIQUES IN SPEECH RECOGNITION

被引:312
作者
PICONE, JW
机构
[1] Central Research Laboratories, Texas Instruments, Inc., Dallas, TX, 75265, P.O. Box 655474
关键词
D O I
10.1109/5.237532
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similarity transform techniques, often used to normalize and decorrelate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal's spectrum can be estimated in a closed-loop manner. In this paper, we review the signal processing components of these algorithms. These algorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in state-of-the-art speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
引用
收藏
页码:1215 / 1247
页数:33
相关论文
共 114 条
[1]  
ALLEN JB, 1985, IEEE ASSP MAG, V3, P3
[2]  
ANDERBERG MR, 1973, CLUSTER ANAL APPLICA
[3]  
Atal B. S., 1982, Proceedings of ICASSP 82. IEEE International Conference on Acoustics, Speech and Signal Processing, P614
[4]   EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].
ATAL, BS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) :1304-1312
[5]   SPEECH ANALYSIS AND SYNTHESIS BY LINEAR PREDICTION OF SPEECH WAVE [J].
ATAL, BS ;
HANAUER, SL .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 50 (02) :637-+
[6]   PREDICTIVE CODING OF SPEECH AT LOW BIT RATES [J].
ATAL, BS .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1982, 30 (04) :600-614
[7]  
AVERBUCH A, 1986, APR P IEEE INT C AC, P53
[8]   FRAME-SPECIFIC STATISTICAL FEATURES FOR SPEAKER INDEPENDENT SPEECH RECOGNITION [J].
BOCCHIERI, EL ;
DODDINGTON, GR .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04) :755-764
[9]  
Brigham E. O., 1974, FAST FOURIER TRANSFO
[10]  
CAMPBELL J, 1986, APR P IEEE INT C AC, P473