Statistical model for large-scale peptide identification in databases from tandem mass spectra using SEQUEST

被引:84
作者
López-Ferrer, D
Martinez-Bartolomé, S
Villar, M
Campillos, M
Martín-Maroto, F
Vázquez, J
机构
[1] Severo Ochoa CSIC, Ctr Biol Mol, Madrid 28049, Spain
[2] ThermoFinnigan, San Jose, CA 95134 USA
关键词
D O I
10.1021/ac049305c
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Recent technological advances have made multidimensional peptide separation techniques coupled with tandem mass spectrometry the method of choice for high-throughput identification of proteins. Due to these advances, the development of software tools for large-scale, fully automated, unambiguous peptide identification is highly necessary. In this work, we have used as a model the nuclear proteome from Jurkat cells and present a processing algorithm that allows accurate predictions of random matching distributions, based on the two SEQUEST scores Xcorr and DeltaCn. Our method permits a very simple and precise calculation of the probabilities associated with individual peptide assignments, as well as of the false discovery rate among the peptides identified in any experiment. A further mathematical analysis demonstrates that the score distributions are highly dependent on database size and precursor mass window and suggests that the probability associated with SEQUEST scores depends on the number of candidate peptide sequences available for the search. Our results highlight the importance of adjusting the filtering criteria to discriminate between correct and incorrect peptide sequences according to the circumstances of each particular experiment.
引用
收藏
页码:6853 / 6860
页数:8
相关论文
共 17 条
[1]   A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: Support vector machine classification of peptide MS/MS spectra and SEQUEST scores [J].
Anderson, DC ;
Li, WQ ;
Payan, DG ;
Noble, WS .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (02) :137-146
[2]  
Armesilla AL, 1999, MOL CELL BIOL, V19, P2032
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   Intensity-based protein identification by machine learning from a library of tandem mass spectra [J].
Elias, JE ;
Gibbons, FD ;
King, OD ;
Roth, FP ;
Gygi, SP .
NATURE BIOTECHNOLOGY, 2004, 22 (02) :214-219
[5]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[6]   A proteomic view of the Plasmodium falciparum life cycle [J].
Florens, L ;
Washburn, MP ;
Raine, JD ;
Anthony, RM ;
Grainger, M ;
Haynes, JD ;
Moch, JK ;
Muster, N ;
Sacci, JB ;
Tabb, DL ;
Witney, AA ;
Wolters, D ;
Wu, YM ;
Gardner, MJ ;
Holder, AA ;
Sinden, RE ;
Yates, JR ;
Carucci, DJ .
NATURE, 2002, 419 (6906) :520-526
[7]   Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search [J].
Keller, A ;
Nesvizhskii, AI ;
Kolker, E ;
Aebersold, R .
ANALYTICAL CHEMISTRY, 2002, 74 (20) :5383-5392
[8]   Direct analysis of protein complexes using mass spectrometry [J].
Link, AJ ;
Eng, J ;
Schieltz, DM ;
Carmack, E ;
Mize, GJ ;
Morris, DR ;
Garvik, BM ;
Yates, JR .
NATURE BIOTECHNOLOGY, 1999, 17 (07) :676-682
[9]   Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: The yeast proteome [J].
Peng, JM ;
Elias, JE ;
Thoreen, CC ;
Licklider, LJ ;
Gygi, SP .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (01) :43-50
[10]   A computational method for assessing peptide-identification reliability in tandem mass spectrometry analysis with SEQUEST [J].
Razumovskaya, J ;
Olman, V ;
Xu, D ;
Uberbacher, EC ;
VerBerkmoes, NC ;
Hettich, RL ;
Xu, Y .
PROTEOMICS, 2004, 4 (04) :961-969