Classification of signaling proteins based on molecular star graph descriptors using Machine Learning models

被引:19
作者
Fernandez-Lozano, Carlos [1 ]
Cuinas, Ruben F. [1 ]
Seoane, Jose A. [2 ]
Fernandez-Blanco, Enrique [1 ]
Dorado, Julian [1 ]
Munteanu, Cristian R. [1 ,3 ]
机构
[1] Univ A Coruna, Fac Comp Sci, Informat & Commun Technol Dept, La Coruna 15071, Spain
[2] Univ Bristol, Bristol Genet Epidemiol Labs, Sch Social & Community Med, Bristol BS82BN, Avon, England
[3] Maastricht Univ, Dept Bioinformat BiGCaT, NL-6200 MD Maastricht, Netherlands
关键词
Feature selection; SVM-RFE; Topological indices; Signal transduction pathway; FEATURE-SELECTION TECHNIQUES; PERFORMANCE-MEASURES; PART I; QSAR; REPRESENTATION; RECEPTORS; INHIBITORS; PEPTIDES; SEQUENCE; NETWORK;
D O I
10.1016/j.jtbi.2015.07.038
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Signaling proteins are an important topic in drug development due to the increased importance of finding fast, accurate and cheap methods to evaluate new molecular targets involved in specific diseases. The complexity of the protein structure hinders the direct association of the signaling activity with the molecular structure. Therefore, the proposed solution involves the use of protein star graphs for the peptide sequence information encoding into specific topological indices calculated with S2SNet tool. The Quantitative Structure-Activity Relationship classification model obtained with Machine Learning techniques is able to predict new signaling peptides. The best classification model is the first signaling prediction model, which is based on eleven descriptors and it was obtained using the Support Vector Machines-Recursive Feature Elimination (SVM-RFE) technique with the Laplacian kernel (RFE-LAP) and an AUROC of 0.961. Testing a set of 3114 proteins of unknown function from the PDB database assessed the prediction performance of the model. Important signaling pathways are presented for three UniprotIDs (34 PDBs) with a signaling prediction greater than 98.0%. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:50 / 58
页数:9
相关论文
共 97 条
[1]   Ubiquitin-mediated activation of TAK1 and IKK [J].
Adhikari, A. ;
Xu, M. ;
Chen, Z. J. .
ONCOGENE, 2007, 26 (22) :3214-3226
[2]   Naive Bayes QSDR classification based on spiral-graph Shannon entropies for protein biomarkers in human colon cancer [J].
Aguiar-Pulido, Vanessa ;
Munteanu, Cristian R. ;
Seoane, Jose A. ;
Fernandez-Blanco, Enrique ;
Perez-Montoto, Lazaro G. ;
Gonzalez-Diaz, Humberto ;
Dorado, Julian .
MOLECULAR BIOSYSTEMS, 2012, 8 (06) :1716-1722
[3]   Machine learning classification of cell-specific cardiac enhancers uncovers developmental subnetworks regulating progenitor cell division and cell fate specification [J].
Ahmad, Shaad M. ;
Busser, Brian W. ;
Huang, Di ;
Cozart, Elizabeth J. ;
Michaud, Sebastien ;
Zhu, Xianmin ;
Jeffries, Neal ;
Aboukhalil, Anton ;
Bulyk, Martha L. ;
Ovcharenko, Ivan ;
Michelson, Alan M. .
DEVELOPMENT, 2014, 141 (04) :878-888
[4]  
[Anonymous], 2004, Analyzing microarray gene expression data
[5]  
[Anonymous], PLOS COMPUT BIOL
[6]  
[Anonymous], 2004, Handbook of parametric and nonparametric statistical procedures
[7]  
[Anonymous], WXPYTHON IN ACTION
[8]  
[Anonymous], 2005, R: a language and environment for statistical computing
[9]  
ARCHER S, 1978, National Institute on Drug Abuse Research Monograph, P86
[10]   A review of particle swarm optimization. Part I: Background and development [J].
Banks A. ;
Vincent J. ;
Anyakoha C. .
Natural Computing, 2007, 6 (4) :467-484