Training radial basis function neural networks: effects of training set size and imbalanced training sets

被引:30
作者
Al-Haddad, L
Morris, CW
Boddy, L [1 ]
机构
[1] Cardiff Univ, Cardiff Sch Biosci, Cardiff CF1 3TL, S Glam, Wales
[2] Univ Glamorgan, Sch Comp Studies, Pontypridd CF37 1DL, M Glam, Wales
基金
英国自然环境研究理事会;
关键词
identification; microalgae; flow cytometry; RBF neural networks; training neural networks;
D O I
10.1016/S0167-7012(00)00202-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Obtaining training data for constructing artificial neural networks (ANNs) to identify microbiological taxa is not always easy. Often, only small data sets with different numbers of observations per taxon are available. Here, the effect of both size of the training data set and of an imbalanced number of training patterns for different taxa is investigated using radial basis function ANNs to identify up to 60 species of marine microalgae. The best networks trained to discriminate 20, 40 and 60 species respectively gave overall percentage correct identification of 92, 84 and 77%. From 100 to 200 patterns per species was sufficient in networks trained to discriminate 20, 40 or 60 species. For 40 and 60 species data sets an imbalance in the number of training patterns per species always affected training success, the greater the imbalance the greater the effect. However, this could be largely compensated for by adjusting the networks using a posteriori probabilities, estimated as network output values. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:33 / 44
页数:12
相关论文
共 30 条
[1]   BACKPROPAGATION USES PRIOR INFORMATION EFFICIENTLY [J].
BARNARD, E ;
BOTHA, EC .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1993, 4 (05) :794-802
[2]   What Size Net Gives Valid Generalization? [J].
Baum, Eric B. ;
Haussler, David .
NEURAL COMPUTATION, 1989, 1 (01) :151-160
[3]  
Blackburn N, 1998, APPL ENVIRON MICROB, V64, P3246
[4]  
BODDIE J, 1994, DATAMATION, V40, P15
[5]   Identification of 72 phytoplankton species by radial basis function neural network analysis of flow cytometric data [J].
Boddy, L ;
Morris, CW ;
Wilkins, MF ;
Al-Haddad, L ;
Tarran, GA ;
Jonker, RR ;
Burkill, PH .
MARINE ECOLOGY PROGRESS SERIES, 2000, 195 :47-59
[6]  
BODDY L, 1994, P OCEANS 94 OSATES C, V1, P565
[7]  
BODDY L, 1999, MACHINE LEARNING MET, P37
[8]   ORTHOGONAL LEAST-SQUARES LEARNING ALGORITHM FOR RADIAL BASIS FUNCTION NETWORKS [J].
CHEN, S ;
COWAN, CFN ;
GRANT, PM .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1991, 2 (02) :302-309
[9]   Automatic classification of field-collected dinoflagellates by artificial neural network [J].
Culverhouse, PF ;
Simpson, RG ;
Ellis, R ;
Lindley, JA ;
Williams, R ;
Parisini, T ;
Reguera, B ;
Bravo, I ;
Zoppoli, R ;
Earnshaw, G ;
McCall, H ;
Smith, G .
MARINE ECOLOGY PROGRESS SERIES, 1996, 139 (1-3) :281-287
[10]  
Frankel DS, 1996, CYTOMETRY, V23, P290, DOI 10.1002/(SICI)1097-0320(19960401)23:4<290::AID-CYTO5>3.3.CO