Complex-valued autoencoders

被引:36
作者
Baldi, Pierre [1 ]
Lu, Zhiqin [2 ]
机构
[1] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
[2] Univ Calif Irvine, Dept Math, Irvine, CA 92697 USA
基金
美国国家科学基金会;
关键词
Autoencoders; Unsupervised learning; Complex numbers; Complex neural networks; Critical points; Linear networks; Principal component analysis; EM algorithm; Deep architectures; Differential geometry;
D O I
10.1016/j.neunet.2012.04.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Autoencoders are unsupervised machine learning circuits, with typically one hidden layer, whose learning goal is to minimize an average distortion measure between inputs and outputs. Linear autoencoders correspond to the special case where only linear transformations between visible and hidden variables are used. While linear autoencoders can be defined over any field, only real-valued linear autoencoders have been studied so far. Here we study complex-valued linear autoencoders where the components of the training vectors and adjustable matrices are defined over the complex field with the L-2 norm. We provide simpler and more general proofs that unify the real-valued and complex-valued cases, showing that in both cases the landscape of the error function is invariant under certain groups of transformations. The landscape has no local minima, a family of global minima associated with Principal Component Analysis, and many families of saddle points associated with orthogonal projections onto sub-space spanned by sub-optimal subsets of eigenvectors of the covariance matrix. The theory yields several iterative, convergent, learning algorithms, a clear understanding of the generalization properties of the trained autoencoders, and can equally be applied to the hetero-associative case when external targets are provided. Partial results on deep architecture as well as the differential geometry of autoencoders are also presented. The general framework described here is useful to classify autoencoders and identify general properties that ought to be investigated for each class, illuminating some of the connections between autoencoders, unsupervised learning, clustering, Hebbian learning, and information theory. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:136 / 147
页数:12
相关论文
共 26 条
[1]  
Amari S., 2007, METHODS INFORM GEOME
[2]  
Amari S., 1990, Differential-Geometrical Methods in Statistics
[3]  
[Anonymous], 2008, P ICML, DOI 10.1145/1390156.1390294
[4]  
[Anonymous], 1988, Parallel distributed processing
[5]  
[Anonymous], 2001, Matrix Analysis and Applied Linear Algebra
[6]  
[Anonymous], 2007, LARGE SCALE KERNEL M
[7]  
[Anonymous], 1999, A Comprehensive Introduction to Differential Geometry
[8]   NEURAL NETWORKS AND PRINCIPAL COMPONENT ANALYSIS - LEARNING FROM EXAMPLES WITHOUT LOCAL MINIMA [J].
BALDI, P ;
HORNIK, K .
NEURAL NETWORKS, 1989, 2 (01) :53-58
[9]  
Baldi P., 2012, J MACHINE L IN PRESS
[10]  
Bell Robert M., 2007, Acm Sigkdd Explorations Newsletter, V9, P75