Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search

被引:3891
作者
Keller, A [1 ]
Nesvizhskii, AI [1 ]
Kolker, E [1 ]
Aebersold, R [1 ]
机构
[1] Inst Syst Biol, Seattle, WA 98103 USA
关键词
D O I
10.1021/ac025747h
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities, that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. Ibis analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.
引用
收藏
页码:5383 / 5392
页数:10
相关论文
共 35 条
  • [1] Mass spectrometry in proteomics
    Aebersold, R
    Goodlett, DR
    [J]. CHEMICAL REVIEWS, 2001, 101 (02) : 269 - 295
  • [2] Bafna V, 2001, Bioinformatics, V17 Suppl 1, pS13
  • [3] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [4] AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE
    ENG, JK
    MCCORMACK, AL
    YATES, JR
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) : 976 - 989
  • [5] Base-calling of automated sequencer traces using phred.: II.: Error probabilities
    Ewing, B
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 186 - 194
  • [6] Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment
    Ewing, B
    Hillier, L
    Wendl, MC
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 175 - 185
  • [7] Identifying the proteome:: software tools
    Fenyö, D
    [J]. CURRENT OPINION IN BIOTECHNOLOGY, 2000, 11 (04) : 391 - 395
  • [8] Field HI, 2002, PROTEOMICS, V2, P36, DOI 10.1002/1615-9861(200201)2:1<36::AID-PROT36>3.3.CO
  • [9] 2-N
  • [10] Functional organization of the yeast proteome by systematic analysis of protein complexes
    Gavin, AC
    Bösche, M
    Krause, R
    Grandi, P
    Marzioch, M
    Bauer, A
    Schultz, J
    Rick, JM
    Michon, AM
    Cruciat, CM
    Remor, M
    Höfert, C
    Schelder, M
    Brajenovic, M
    Ruffner, H
    Merino, A
    Klein, K
    Hudak, M
    Dickson, D
    Rudi, T
    Gnau, V
    Bauch, A
    Bastuck, S
    Huhse, B
    Leutwein, C
    Heurtier, MA
    Copley, RR
    Edelmann, A
    Querfurth, E
    Rybin, V
    Drewes, G
    Raida, M
    Bouwmeester, T
    Bork, P
    Seraphin, B
    Kuster, B
    Neubauer, G
    Superti-Furga, G
    [J]. NATURE, 2002, 415 (6868) : 141 - 147