A suffix tree approach to the interpretation of tandem mass spectra: applications to peptides of non-specific digestion and post-translational modifications

被引:28
作者
Lu, Bingwen [1 ]
Chen, Ting [1 ]
机构
[1] Univ So Calif, Dept Biol Sci, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
关键词
D O I
10.1093/bioinformatics/btg1068
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Tandem mass spectrometry combined with sequence database searching is one of the most powerful tools for protein identification. As thousands of spectra are generated by a mass spectrometer in one hour, the speed of database searching is critical, especially when searching against a large sequence database, or when the peptide is generated by some unknown or non-specific enzyme, even or when the target peptides have post-translational modifications (PTM). In practice, about 7090% of the spectra have no match in the database. Many believe that a significant portion of them are due to peptides of non-specific digestions by unknown enzymes or amino acid modifications. In another case, scientists may choose to use some non-specific enzymes such as pepsin or thermolysin for proteolysis in proteomic study, in that not all proteins are amenable to be digested by some site-specific enzymes, and furthermore many digested peptides may not fall within the rang of molecular weight suitable for mass spectrometry analysis. Interpretating mass spectra of these kinds will cost a lot of computational time of database search engines. Overview: The present study was designed to speed up the database searching process for both cases. More specifically speaking, we employed an approach combining suffix tree data structure and spectrum graph. The suffix tree is used to preprocess the protein sequence database, while the spectrum graph is used to preprocess the tandem mass spectrum. We then search the suffix tree against the spectrum graph for candidate peptides. We design an efficient algorithm to compute a matching threshold with some statistical significance level, e. g. p = 0.01, for each spectrum, and use it to select candidate peptides. Then we rank these peptides using a SEQUEST-like scoring function. The algorithms were implemented and tested on experimental data. For post-translational modifications, we allow arbitrary number of any modification to a protein.
引用
收藏
页码:II113 / II121
页数:9
相关论文
共 23 条
[1]  
ARNOTT D, 1993, CLIN CHEM, V39, P2005
[2]  
Bafna V, 2001, Bioinformatics, V17 Suppl 1, pS13
[3]   WEIGHING NAKED PROTEINS - PRACTICAL, HIGH-ACCURACY MASS MEASUREMENT OF PEPTIDES AND PROTEINS [J].
CHAIT, BT ;
KENT, SBH .
SCIENCE, 1992, 257 (5078) :1885-1894
[4]   A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry [J].
Chen, T ;
Kao, MY ;
Tepel, M ;
Rush, J ;
Church, GM .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (03) :325-337
[5]   Role of accurate mass measurement (±10 ppm) in protein identification strategies employing MS or MS MS and database searching [J].
Clauser, KR ;
Baker, P ;
Burlingame, AL .
ANALYTICAL CHEMISTRY, 1999, 71 (14) :2871-2882
[6]  
Cormen T. H., 2001, Introduction to Algorithms, V2nd
[7]   De novo peptide sequencing via tandem mass spectrometry [J].
Dancík, V ;
Addona, TA ;
Clauser, KR ;
Vath, JE ;
Pevzner, PA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :327-342
[8]  
EDWARDS N, 2002, 2 WORKSH ALG BIOINF
[9]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[10]  
Gusfield D., 1997, ALGORITHMS STRINGS T