High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results

被引:107
作者
Searle, BC
Dasari, S
Turner, M
Reddy, AP
Choi, DS
Wilmarth, PA
McCormack, AL
David, LL
Nagalla, SR [1 ]
机构
[1] Oregon Hlth & Sci Univ, Dept Pediat, Portland, OR 97239 USA
[2] Oregon Hlth & Sci Univ, Dept Publ Hlth & Prevent Med, Portland, OR 97239 USA
[3] Oregon Hlth & Sci Univ, Sch Dent, Portland, OR 97239 USA
[4] Oregon Hlth & Sci Univ, Oregon Natl Primate Res Ctr, Beaverton, OR 97006 USA
关键词
D O I
10.1021/ac035258x
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
With the increasing availability of de novo sequencing algorithms for interpreting high-mass accuracy tandem mass spectrometty (MS/MS) data, there is a growing need for programs that accurately identify proteins from de novo sequencing results. De novo sequences derived from tandem mass spectra of peptides often contain ambiguous regions where the exact amino acid order cannot be determined. One problem this poses for sequence alignment algorithms is the difficulty in distinguishing discrepancies due to de novo sequencing errors from actual genomic sequence variation and posttranslational modifications. We present a novel, mass-based approach to sequence alignment, implemented as a program called OpenSea, to resolve these problems. In this approach, de novo and database sequences are interpreted as masses of residues, and the masses, rather than the amino acid codes, are compared. To provide further flexibility, the masses can be aligned in groups, which can resolve many de novo sequencing errors. The performance of OpenSea was tested with three types of data: a mixture of known proteins, a mixture of unknown proteins that commonly contain sequence variations, and a mixture of posttranslationally modified known proteins. In all three cases, we demonstrate that OpenSea can identify more peptides and proteins than commonly used database-searching programs (SEQUEST and ProteinLynx) while accurately locating sequence variation sites and unanticipated posttranslational modifications in a high-throughput environment.
引用
收藏
页码:2220 / 2230
页数:11
相关论文
共 52 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
BALROCH A, 1991, NUCLEIC ACIDS RES, V19, P2247
[4]   Cleavage N-terminal to proline: Analysis of a database of peptide tandem mass spectra [J].
Breci, LA ;
Tabb, DL ;
Yates, JR ;
Wysocki, VH .
ANALYTICAL CHEMISTRY, 2003, 75 (09) :1963-1971
[5]  
CLAUSER KR, 1996, P 44 ASMS C MASS SPE
[6]  
Creasy DM, 2002, PROTEOMICS, V2, P1426, DOI 10.1002/1615-9861(200210)2:10<1426::AID-PROT1426>3.0.CO
[7]  
2-5
[8]  
DENNY R, 2000, USE SEARCH WORKFLOWS
[9]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[10]  
Fernandez-de-Cossio J, 1998, RAPID COMMUN MASS SP, V12, P1867, DOI 10.1002/(SICI)1097-0231(19981215)12:23<1867::AID-RCM407>3.0.CO