Analysing the localisation sites of proteins through neural networks ensembles

被引:7
作者
Anastasiadis, Aristoklis D. [1 ]
Magoulas, George D. [1 ]
机构
[1] Univ London Birkbeck Coll, Sch Comp Sci & Informat Syst, London WC1E 7HX, England
关键词
feedforward neural networks; neural ensembles; protein localisation; imbalanced datasets; K-nearest neighbour;
D O I
10.1007/s00521-006-0029-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scientists involved in the area of proteomics are currently seeking integrated, customised and validated research solutions to better expedite their work in proteomics analyses and drug discoveries. Some drugs and most of their cell targets are proteins, because proteins dictate biological phenotype. In this context, the automated analysis of protein localisation is more complex than the automated analysis of DNA sequences; nevertheless the benefits to be derived are of same or greater importance. In order to accomplish this target, the right choice of the kind of the methods for these applications, especially when the data set is drastically imbalanced, is very important and crucial. In this paper we investigate the performance of some commonly used classifiers, such as the K nearest neighbours and feed-forward neural networks with and without cross-validation, in a class of imbalanced problems from the bioinformatics domain. Furthermore, we construct ensemble-based schemes using the notion of diversity, and we empirically test their performance on the same problems. The experimental results favour the generation of neural network ensembles as these are able to produce good generalisation ability and significant improvement compared to other single classifier methods.
引用
收藏
页码:277 / 288
页数:12
相关论文
共 35 条
[1]  
[Anonymous], 2003, P 1 AS PAC BIOINF C, DOI [10.5555/820189.820218, DOI 10.5555/820189.820218]
[2]  
[Anonymous], 1983, Statistical methods
[3]  
[Anonymous], 1996, UCI REPOSITORY MACHI
[4]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[5]   After sequencing: Quantitative analysis of protein localization [J].
Boland, MV ;
Murphy, RF .
IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, 1999, 18 (05) :115-119
[6]  
BOLAT B, 2003, INT 12 TURK S ART IN, P1137
[7]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[8]   A comparison of categorisation algorithms for predicting the cellular localization sites of proteins [J].
Cairns, P ;
Huyck, C ;
Mitchell, I ;
Wu, WXH .
12TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2001, :296-300
[9]   Predicting gene function in Saccharomyces cerevisiae [J].
Clare, A. ;
King, R. D. .
BIOINFORMATICS, 2003, 19 :II42-II49
[10]  
Duda R. O., 1973, PATTERN CLASSIFICATI