The paragon algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra

被引:1100
作者
Shilov, Ignat V. [1 ]
Seymour, Sean L. [1 ]
Patel, Alpesh A. [1 ]
Loboda, Alex [1 ]
Tang, Wilfred H. [1 ]
Keating, Sean P. [1 ]
Hunter, Christie L. [1 ]
Nuwaysir, Lydia M. [1 ]
Schaeffer, Daniel A. [1 ]
机构
[1] Appl Biosyst Inc, Foster City, CA 94404 USA
关键词
D O I
10.1074/mcp.T600050-MCP200
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The Paragon (TM) Algorithm, a novel database search engine for the identification of peptides from tandem mass spectrometry data, is presented. Sequence Temperature Values are computed using a sequence tag algorithm, allowing the degree of implication by an MS/ MS spectrum of each region of a database to be determined on a continuum. Counter to conventional approaches, features such as modifications, substitutions, and cleavage events are modeled with probabilities rather than by discrete user- controlled settings to consider or not consider a feature. The use of feature probabilities in conjunction with Sequence Temperature Values allows for a very large increase in the effective search space with only a very small increase in the actual number of hypotheses that must be scored. The algorithm has a new kind of user interface that removes the user expertise requirement, presenting control settings in the language of the laboratory that are translated to optimal algorithmic settings. To validate this new algorithm, a comparison with Mascot is presented for a series of analogous searches to explore the relative impact of increasing search space probed with Mascot by relaxing the tryptic digestion conformance requirements from trypsin to semitrypsin to no enzyme and with the Paragon Algorithm using its Rapid mode and Thorough mode with and without tryptic specificity. Although they performed similarly for small search space, dramatic differences were observed in large search space. With the Paragon Algorithm, hundreds of biological and artifact modifications, all possible substitutions, and all levels of conformance to the expected digestion pattern can be searched in a single search step, yet the typical cost in search time is only 2 - 5 times that of conventional small search space. Despite this large increase in effective search space, there is no drastic loss of discrimination that typically accompanies the exploration of large search space.
引用
收藏
页码:1638 / 1655
页数:18
相关论文
共 42 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]  
[Anonymous], 2001, Bioinformatics
[3]   Reporting protein identification data - The next generation of guidelines [J].
Bradshaw, RA ;
Burlingame, AL ;
Carr, S ;
Aebersold, R .
MOLECULAR & CELLULAR PROTEOMICS, 2006, 5 (05) :787-788
[4]   The need for guidelines in publication of peptide and protein identification data - Working group on publication guidelines for peptide and protein identification data [J].
Carr, S ;
Aebersold, R ;
Baldwin, M ;
Burlingame, A ;
Clauser, K ;
Nesvizhskii, A .
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (06) :531-533
[5]   Comprehensive analysis of a multidimensional liquid chromatography mass spectrometry dataset acquired on a quadrupole selecting, quadrupole collision cell, time-of-flight mass spectrometer - II. New developments in protein prospector allow for reliable and comprehensive automatic analysis of large datasets [J].
Chalkley, RJ ;
Baker, PR ;
Huang, L ;
Hansen, KC ;
Allen, NP ;
Rexach, M ;
Burlingame, AL .
MOLECULAR & CELLULAR PROTEOMICS, 2005, 4 (08) :1194-1204
[6]   Comprehensive analysis of a multidimensional liquid chromatography mass spectrometry dataset acquired on a quadrupole selecting, quadrupole collision cell, time-of-flight mass spectrometer - I. How much of the data is theoretically interpretable by search engines? [J].
Chalkley, RJ ;
Baker, PR ;
Hansen, KC ;
Medzihradszky, KF ;
Allen, NP ;
Rexach, M ;
Burlingame, AL .
MOLECULAR & CELLULAR PROTEOMICS, 2005, 4 (08) :1189-1193
[7]   Role of accurate mass measurement (±10 ppm) in protein identification strategies employing MS or MS MS and database searching [J].
Clauser, KR ;
Baker, P ;
Burlingame, AL .
ANALYTICAL CHEMISTRY, 1999, 71 (14) :2871-2882
[8]   OLAV: Towards high-throughput tandem mass spectrometry data identification [J].
Colinge, J ;
Masselot, A ;
Giron, M ;
Dessingy, T ;
Magnin, J .
PROTEOMICS, 2003, 3 (08) :1454-1463
[9]   A method for reducing the time required to match protein sequences with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2003, 17 (20) :2310-2316
[10]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467