A statistical basis for testing the significance of mass spectrometric protein identification results

被引:90
作者
Eriksson, J [1 ]
Chait, BT [1 ]
Fenyö, D [1 ]
机构
[1] Rockefeller Univ, New York, NY 10021 USA
关键词
D O I
10.1021/ac990792j
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
A method for testing the significance of mass spectrometric (MS) protein identification results is presented. MS proteolytic peptide mapping and genome database searching provide a rapid, sensitive, and potentially accurate means for identifying proteins. Database search algorithms detect the matching between proteolytic peptide masses from an MS peptide map and theoretical proteolytic peptide masses of the proteins in a genome database. The number of masses that matches is used to compute a score, S, for each protein, and the protein that yields the best score is assumed as the identification result. There is a risk of obtaining a false result, because masses determined by MS are not unique; i.e., each mass in a peptide map can match randomly one or several proteins in a genome database. A false result is obtained when the score, S, due to random matching cannot be discerned from the score due to matching with a real protein in the sample. We therefore introduce the frequency function, f(S), for false (random) identification results as a basis for testing at what significance level, a, one can reject a null hypothesis, H-0: "the result is false". The significance is tested by comparing an experimental score, SE, with a critical score, Sc, required for a significant result at the level alpha. If S-E greater than or equal to S-C, H-0 is rejected. f(S) and S-C were obtained by simulations utilizing random tryptic peptide maps generated from a genome database. The critical score, S-C, was studied as a function of the number of masses in the peptide map, the mass accuracy, the degree of incomplete enzymatic cleavage, the protein mass range, and the size of the genome. With S-C known for a variety of experimental constraints, significance testing can be fully automated and integrated with database searching software used for protein identification.
引用
收藏
页码:999 / 1005
页数:7
相关论文
共 33 条
  • [1] The complete genome sequence of Escherichia coli K-12
    Blattner, FR
    Plunkett, G
    Bloch, CA
    Perna, NT
    Burland, V
    Riley, M
    ColladoVides, J
    Glasner, JD
    Rode, CK
    Mayhew, GF
    Gregor, J
    Davis, NW
    Kirkpatrick, HA
    Goeden, MA
    Rose, DJ
    Mau, B
    Shao, Y
    [J]. SCIENCE, 1997, 277 (5331) : 1453 - +
  • [2] Genome sequence of the nematode C-elegans:: A platform for investigating biology
    不详
    [J]. SCIENCE, 1998, 282 (5396) : 2012 - 2018
  • [3] RAPID MASS-SPECTROMETRIC PEPTIDE SEQUENCING AND MASS MATCHING FOR CHARACTERIZATION OF HUMAN-MELANOMA PROTEINS ISOLATED BY 2-DIMENSIONAL PAGE
    CLAUSER, KR
    HALL, SC
    SMITH, DM
    WEBB, JW
    ANDREWS, LE
    TRAN, HM
    EPSTEIN, LB
    BURLINGAME, AL
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (11) : 5072 - 5076
  • [4] DAVIES OL, 1976, STAT METHODS RES PRO
  • [5] Eriksson J., UNPUB
  • [6] Protein identification using mass spectrometric information
    Fenyö, D
    Qin, J
    Chait, BT
    [J]. ELECTROPHORESIS, 1998, 19 (06) : 998 - 1005
  • [7] Strategies for whole microbial genome sequencing and analysis
    Fraser, CM
    Fleischmann, RD
    [J]. ELECTROPHORESIS, 1997, 18 (08) : 1207 - 1216
  • [8] Life with 6000 genes
    Goffeau, A
    Barrell, BG
    Bussey, H
    Davis, RW
    Dujon, B
    Feldmann, H
    Galibert, F
    Hoheisel, JD
    Jacq, C
    Johnston, M
    Louis, EJ
    Mewes, HW
    Murakami, Y
    Philippsen, P
    Tettelin, H
    Oliver, SG
    [J]. SCIENCE, 1996, 274 (5287) : 546 - &
  • [9] Goffeau A., 1996, SCIENCE, V274, p[546, 563]
  • [10] A subset of TAFIIs are integral components of the SAGA complex required for nucleosome acetylation and transcriptional stimulation
    Grant, PA
    Schieltz, D
    Pray-Grant, MG
    Steger, DJ
    Reese, JC
    Yates, JR
    Workman, JL
    [J]. CELL, 1998, 94 (01) : 45 - 53