A statistical model for identifying proteins by tandem mass spectrometry

被引:3749
作者
Nesvizhskii, AI [1 ]
Keller, A [1 ]
Kolker, E [1 ]
Aebersold, R [1 ]
机构
[1] Inst Syst Biol, Seattle, WA 98103 USA
关键词
D O I
10.1021/ac0341261
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample. Peptides that correspond to more than a single protein in the sequence database are apportioned among all corresponding proteins, and a minimal protein list sufficient to account for the observed peptide assignments is derived using the expectation-maximization algorithm. Using peptide assignments to spectra generated from a sample of 18 purified proteins, as well as complex H. influenzae and Halobacterium samples, the model is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications. This method allows filtering of large-scale proteomics data sets with predictable sensitivity and false positive identification error rates. Fast, consistent, and transparent, it provides a standard for publishing large-scale protein identification data sets in the literature and for comparing the results obtained from different experiments.
引用
收藏
页码:4646 / 4658
页数:13
相关论文
共 50 条
  • [1] Mass spectrometry in proteomics
    Aebersold, R
    Goodlett, DR
    [J]. CHEMICAL REVIEWS, 2001, 101 (02) : 269 - 295
  • [2] Coordinate regulation of energy transduction modules in Halobacterium sp analyzed by a global systems approach
    Baliga, NS
    Pan, M
    Goo, YA
    Yi, EC
    Goodlett, DR
    Dimitrov, K
    Shannon, P
    Aebersold, R
    Ng, WV
    Hood, L
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (23) : 14913 - 14918
  • [3] Chakravarti DN, 2002, BIOTECHNIQUES, P4
  • [4] Choudhary JS, 2001, PROTEOMICS, V1, P651, DOI 10.1002/1615-9861(200104)1:5<651::AID-PROT651>3.0.CO
  • [5] 2-N
  • [6] Role of accurate mass measurement (±10 ppm) in protein identification strategies employing MS or MS MS and database searching
    Clauser, KR
    Baker, P
    Burlingame, AL
    [J]. ANALYTICAL CHEMISTRY, 1999, 71 (14) : 2871 - 2882
  • [7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [8] Eddes JS, 2002, PROTEOMICS, V2, P1097, DOI 10.1002/1615-9861(200209)2:9<1097::AID-PROT1097>3.0.CO
  • [9] 2-X
  • [10] AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE
    ENG, JK
    MCCORMACK, AL
    YATES, JR
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) : 976 - 989