Randomized sequence databases for tandem mass spectrometry peptide and protein identification

被引:72
作者
Higdon, R
Hogan, JM
Van Belle, G
Kolker, E
机构
[1] BIATECH Inst, Bothell, WA 98011 USA
[2] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[3] Univ Washington, Dept Environm & Occupat Hlth Sci, Seattle, WA 98195 USA
关键词
D O I
10.1089/omi.2005.9.364
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."
引用
收藏
页码:364 / 379
页数:16
相关论文
共 39 条
[1]   Mass spectrometry in proteomics [J].
Aebersold, R ;
Goodlett, DR .
CHEMICAL REVIEWS, 2001, 101 (02) :269-295
[2]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[3]   In vitro and in silico processes to identify differentially expressed proteins [J].
Allet, N ;
Barrillat, N ;
Baussant, T ;
Boiteau, C ;
Botti, P ;
Bougueleret, L ;
Budin, N ;
Canet, D ;
Carraud, S ;
Chiappe, D ;
Christmann, N ;
Colinge, J ;
Cusin, I ;
Dafflon, N ;
Depresle, B ;
Fasso, I ;
Frauchiger, P ;
Gaertner, H ;
Gleizes, A ;
Gonzalez-Couto, E ;
Jeandenans, C ;
Karmime, A ;
Kowall, T ;
Lagache, S ;
Mahé, E ;
Masselot, A ;
Mattou, H ;
Moniatte, M ;
Niknejad, A ;
Paolini, M ;
Perret, F ;
Pinaud, N ;
Ranno, F ;
Raimondi, S ;
Reffas, S ;
Regamey, PO ;
Rey, PA ;
Rodriguez-Tomé, P ;
Rose, K ;
Rossellat, G ;
Saudrais, C ;
Schmidt, C ;
Villain, M ;
Zwahlen, C .
PROTEOMICS, 2004, 4 (08) :2333-2351
[4]  
[Anonymous], 1999, GEN LINEAR MODELS
[5]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[6]  
Doolittle R.F., 1986, Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences
[7]  
Durbin R., 1998, BIOL SEQUENCE ANAL
[8]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[9]  
FELSENSTEIN J, 1978, THEORY EVOLUTIONARY
[10]  
GALPERIN MY, 2003, FRONTIERS COMPUTATIO