Isoelectric point optimization using peptide descriptors and support vector machines

被引:30
作者
Perez-Riverol, Yasset [1 ,4 ]
Audain, Enrique [2 ]
Millan, Aleli [1 ]
Ramos, Yassel [1 ]
Sanchez, Aniel [1 ]
Vizcaino, Juan Antonio [4 ]
Wang, Rui [4 ]
Mueller, Markus [3 ]
Machado, Yoan J. [2 ]
Betancourt, Lazaro H. [1 ]
Gonzalez, Luis J. [1 ]
Padron, Gabriel [1 ]
Besada, Vladimir [1 ]
机构
[1] Ctr Genet Engn & Biotechnol, Dept Prote, Havana, Cuba
[2] Ctr Mol Immunol, Dept Prote, Havana, Cuba
[3] Swiss Inst Bioinformat, Proteome Informat Grp, CH-1211 Geneva, Switzerland
[4] European Bioinformat Inst, EMBL Outstn, Cambridge, England
关键词
Isoelectric point; Support vector machine; Peptide descriptors; TANDEM MASS-SPECTROMETRY; IMMOBILIZED PH GRADIENTS; AMINO-ACID-SEQUENCES; SHOTGUN PROTEOMICS; PREDICTION; ACCURACY; PROTEINS; IDENTIFICATION; DATABASE;
D O I
10.1016/j.jprot.2012.01.029
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
IPG (Immobilized pH Gradient) based separations are frequently used as the first step in shotgun proteomics methods; it yields an increase in both the dynamic range and resolution of peptide separation prior to the LC-MS analysis. Experimental isoelectric point (pI) values can improve peptide identifications in conjunction with MS/MS information. Thus, accurate estimation of the pI value based on the amino acid sequence becomes critical to perform these kinds of experiments. Nowadays, pI is commonly predicted using the charge-state model [1], and/or the cofactor algorithm [2]. However, none of these methods is capable of calculating the pI value for basic peptides accurately. In this manuscript, we present an new approach that can significant improve the pI estimation, by using Support Vector Machines (SVM)[3], an experimental amino acid descriptor taken from the AAIndex database [4] and the isoelectric point predicted by the charge-state model. Our results have shown a strong correlation (R-2=0.98) between the predicted and observed values, with a standard deviation of 0.32 pH units across the complete pH range. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:2269 / 2274
页数:6
相关论文
共 26 条
[21]   Enhanced analytical power of SDS-PAGE using machine learning algorithms [J].
Supek, Fran ;
Peharec, Petra ;
Krsnik-Rasol, Marijana ;
Smuc, Tomislav .
PROTEOMICS, 2008, 8 (01) :28-31
[22]   Predicting liquid chromatographic retention times of peptides from the Drosophila melanogaster proteome by machine learning approaches [J].
Tian, Feifei ;
Yang, Li ;
Lv, Fenglin ;
Zhou, Peng .
ANALYTICA CHIMICA ACTA, 2009, 644 (1-2) :10-16
[23]   Peak intensity prediction in MALDI-TOF mass spectrometry: A machine learning study to support quantitative proteomics [J].
Timm, Wiebke ;
Scherbart, Alexandra ;
Boecker, Sebastian ;
Kohlbacher, Oliver ;
Nattkemper, Tim W. .
BMC BIOINFORMATICS, 2008, 9 (1)
[24]  
Vapnik V., 1995, The nature of statistical learning theory
[25]   High-accuracy peptide mass fingerprinting using peak intensity data with machine learning [J].
Yang, Dongmei ;
Ramidssoon, Kevin ;
Hamlett, Eric ;
Giddings, Morgan C. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :62-69
[26]   CHARACTERIZATION OF AMINO ACID SEQUENCES IN PROTEINS BY STATISTICAL METHODS [J].
ZIMMERMAN, JM ;
ELIEZER, N ;
SIMHA, R .
JOURNAL OF THEORETICAL BIOLOGY, 1968, 21 (02) :170-+