Fuzzy ARTMAP Prediction of Biological Activities for Potential HIV-1 Protease Inhibitors Using a Small Molecular Data Set

被引:8
作者
Andonie, Razvan [1 ,2 ]
Fabry-Asztalos, Levente [3 ]
Abdul-Wahid, Christopher Badi' [1 ,3 ]
Abdul-Wahid, Sarah [1 ]
Barker, Grant I. [3 ]
Magill, Lukas C. [1 ]
机构
[1] Cent Washington Univ, Dept Comp Sci, Ellensburg, WA 98926 USA
[2] Transylvania Univ Brasov, Elect & Comp Dept, Brasov, Romania
[3] Cent Washington Univ, Dept Chem, Ellensburg, WA 98926 USA
关键词
Fuzzy neural networks; evolutionary computing and genetic algorithms; computational chemistry; data mining; PROBABILISTIC NEURAL-NETWORKS; FIELD ANALYSIS; 4-HYDROXY-5,6-DIHYDROPYRONES; BACKPROPAGATION; REGRESSION; ALGORITHM; SYSTEM;
D O I
10.1109/TCBB.2009.50
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Obtaining satisfactory results with neural networks depends on the availability of large data samples. The use of small training sets generally reduces performance. Most classical Quantitative Structure-Activity Relationship (QSAR) studies for a specific enzyme system have been performed on small data sets. We focus on the neuro-fuzzy prediction of biological activities of HIV-1 protease inhibitory compounds when inferring from small training sets. We propose two computational intelligence prediction techniques which are suitable for small training sets, at the expense of some computational overhead. Both techniques are based on the FAMR model. The FAMR [1] is a Fuzzy ARTMAP (FAM) incremental learning system used for classification and probability estimation. During the learning phase, each sample pair is assigned a relevance factor proportional to the importance of that pair. The two proposed algorithms in this paper are: 1) The GA-FAMR algorithm, which is new, consists of two stages: a) During the first stage, we use a genetic algorithm (GA) to optimize the relevances assigned to the training data. This improves the generalization capability of the FAMR. b) In the second stage, we use the optimized relevances to train the FAMR. 2) The Ordered FAMR is derived from a known algorithm. Instead of optimizing relevances, it optimizes the order of data presentation using the algorithm of Dagher et al. [2], [3]. In our experiments, we compare these two algorithms with an algorithm not based on the FAM, the FS-GA-FNN introduced in [4], [5]. We conclude that when inferring from small training sets, both techniques are efficient, in terms of generalization capability and execution time. The computational overhead introduced is compensated by better accuracy. Finally, the proposed techniques are used to predict the biological activities of newly designed potential HIV-1 protease inhibitors.
引用
收藏
页码:80 / 93
页数:14
相关论文
共 63 条
[1]
ALDARAISEH A, 2006, P IEEE INT JOINT C N, P1391
[2]
Andonie R, 2005, PROCEEDINGS OF THE 2005 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, P113
[3]
Andonie R., 2006, P IEEE INT JOINT C N, P7495
[4]
Fuzzy ARTMAP with input relevances [J].
Andonie, Razvan ;
Sasu, Lucian .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2006, 17 (04) :929-941
[5]
A new Fuzzy ARTMAP approach for predicting biological activity of potential HIV-1 protease inhibitors [J].
Andonie, Razvan ;
Magill, Lukas ;
Fabry-Asztalos, Levente ;
Abdul-Wahid, Sarah .
2007 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2007, :56-+
[6]
[Anonymous], 1958, Elementary mathematical theory of classification and prediction
[7]
[Anonymous], STAT LEARNING THEORY
[8]
[Anonymous], P 6 INT C INT SYST D
[9]
Application of cascade correlation networks for structures to chemistry [J].
Bianucci, AM ;
Micheli, A ;
Sperduti, A ;
Starita, A .
APPLIED INTELLIGENCE, 2000, 12 (1-2) :117-146
[10]
BIANUCCI AM, 2000, SOFT COMPUTING APPRO