High-accuracy peptide mass fingerprinting using peak intensity data with machine learning

被引:8
作者
Yang, Dongmei [1 ,3 ]
Ramidssoon, Kevin [1 ,3 ]
Hamlett, Eric [1 ,3 ]
Giddings, Morgan C. [1 ,2 ,3 ]
机构
[1] Univ N Carolina, Dept Comp Sci, Dept Microbiol & Immunol, Chapel Hill, NC 27599 USA
[2] Univ N Carolina, Joint Dept Biomed Engn, Chapel Hill, NC 27599 USA
[3] N Carolina State Univ, Raleigh, NC 27695 USA
关键词
ion suppression; mass spectrometry; peptide mass fingerprinting; protein identification; peak intensity;
D O I
10.1021/pr070088g
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
For MALDI-TOF mass spectrometry, we show that the intensity of a peptide-ion peak is directly correlated with its sequence, with the residues M, H, P, R, and L having the most substantial effect on ionization. We developed a machine learning approach that exploits this relationship to significantly improve peptide mass fingerprint (PMF) accuracy based on training data sets from both true-positive and false-positive PMF searches. The model's cross-validated accuracy in distinguishing real versus false-positive database search results is 91%, rivaling the accuracy of MS/MS-based protein identification.
引用
收藏
页码:62 / 69
页数:8
相关论文
共 22 条
[1]   Approximate is better than "exact" for interval estimation of binomial proportions [J].
Agresti, A ;
Coull, BA .
AMERICAN STATISTICIAN, 1998, 52 (02) :119-126
[2]   Ion suppression in mass spectrometry [J].
Annesley, TM .
CLINICAL CHEMISTRY, 2003, 49 (07) :1041-1044
[3]  
[Anonymous], 1989, Applied Logistic Regression
[4]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[5]   A statistical basis for testing the significance of mass spectrometric protein identification results [J].
Eriksson, J ;
Chait, BT ;
Fenyö, D .
ANALYTICAL CHEMISTRY, 2000, 72 (05) :999-1005
[6]   A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes [J].
Fenyö, D ;
Beavis, RC .
ANALYTICAL CHEMISTRY, 2003, 75 (04) :768-774
[7]  
Gay S, 2002, PROTEOMICS, V2, P1374, DOI 10.1002/1615-9861(200210)2:10<1374::AID-PROT1374>3.0.CO
[8]  
2-D
[9]   Genome-based peptide fingerprint scanning [J].
Giddings, MC ;
Shah, AA ;
Gesteland, R ;
Moore, B .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (01) :20-25
[10]   THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1982, 143 (01) :29-36