Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling

被引:15
作者
Variani, Ehsan [1 ]
Sainath, Tara N. [1 ]
Shafran, Izhak [1 ]
Bacchiani, Michiel [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
feature extraction; complex neural network; speech recognition; RECOGNITION;
D O I
10.21437/Interspeech.2016-1459
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
State-of-the-art automatic speech recognition (ASR) systems typically rely on pre-processed features. This paper studies the time-frequency duality in ASR feature extraction methods and proposes extending the standard acoustic model with a complex-valued linear projection layer to learn and optimize features that minimize standard cost functions such as cross entropy. The proposed Complex Linear Projection (CLP) features achieve superior performance compared to pre-processed Log Mel features.
引用
收藏
页码:808 / 812
页数:5
相关论文
共 31 条