FASTER TRAINING USING FUSION OF ACTIVATION FUNCTIONS FOR FEED FORWARD NEURAL NETWORKS

被引:21
作者
Asaduzzaman, Md. [1 ]
Shahjahan, Md. [1 ]
Murase, Kazuyuki [2 ]
机构
[1] KUET, Dept Elect & Elect Engn, Khulna 9203, Bangladesh
[2] Univ Fukui, Dept Human & Artificial Intelligence Syst, Fukui 9108705, Japan
关键词
Neural network; training; activation function; convergence; combination of activation functions; BACKPROPAGATION; ALGORITHMS;
D O I
10.1142/S0129065709002130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilayer feed-forward neural networks are widely used based on minimization of an error function. Back propagation (BP) is a famous training method used in the multilayer networks but it often suffers from the drawback of slow convergence. To make the learning faster, we propose 'Fusion of Activation Functions' (FAF) in which different conventional activation functions (AFs) are combined to compute final activation. This has not been studied extensively yet. One of the sub goals of the paper is to check the role of linear AFs in combination. We investigate whether FAF can enable the learning to be faster. Validity of the proposed method is examined by performing simulations on challenging nine real benchmark classification and time series prediction problems. The FAF has been applied to 2-bit, 3-bit and 4-bit parity, the breast cancer, Diabetes, Heart disease, Iris, wine, Glass and Soybean classification problems. The algorithm is also tested with Mackey-Glass chaotic time series prediction problem. The algorithm is shown to work better than other AFs used independently in BP such as sigmoid (SIG), arctangent (ATAN), logarithmic (LOG).
引用
收藏
页码:437 / 448
页数:12
相关论文
共 23 条
[11]   PARALLEL BACKPROPAGATION LEARNING ALGORITHMS ON CRAY Y-MP8/864 SUPERCOMPUTER [J].
HUNG, SL ;
ADELI, H .
NEUROCOMPUTING, 1993, 5 (06) :287-302
[12]  
Kamruzzaman J, 2002, IEICE T FUND ELECTR, VE85A, P2373
[13]  
KAMRUZZAMAN J, 2003, IEEJ T EIS, V123, P999
[14]   Globally convergent algorithms with local learning rates [J].
Magoulas, GD ;
Plagianakos, VP ;
Vrahatis, MN .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (03) :774-779
[15]   Improving supervised learning by adapting the problem to the learner [J].
Computer Science Department, Brigham Young University, 3365 TMCB, Provo, UT, United States .
Int. J. Neural Syst., 2009, 1 (1-9)
[16]   Magnified gradient function with deterministic weight modification in adaptive learning [J].
Ng, SC ;
Cheung, CC ;
Leung, SH .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2004, 15 (06) :1411-1423
[17]  
PRECHELT LL, 1994, 2194 U KARLSR FAC IN
[18]  
TAWEL R, 1989, ADV NEURAL INFORMATI, V1, P169
[19]   Simulated annealing and weight decay in adaptive learning: The SARPROP algorithm [J].
Treadgold, NK ;
Gedeon, TD .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1998, 9 (04) :662-668
[20]  
Yu CC, 2002, IEEE IJCNN, P1218, DOI 10.1109/IJCNN.2002.1007668