Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics

被引：117

作者：

Choi, Hyungwon ^{[1
,2
]}

Nesvizhskii, Alexey I. ^{[1
,3
]}

机构：

[1] Univ Michigan, Dept Pathol, Ann Arbor, MI 48109 USA

[2] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA

[3] Univ Michigan, Ctr Computat Med & Biol, Ann Arbor, MI 48109 USA

来源：

JOURNAL OF PROTEOME RESEARCH | 2008年 / 7卷 / 01期

关键词：

mass spectrometry; peptide identification; protein sequence database searching; statistical validation; semisupervised modeling; decoy sequences;

D O I：

10.1021/pr070542g

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Development of robust statistical methods for validation of peptide assignments to tandem mass (MS/MS) spectra obtained using database searching remains an important problem. PeptideProphet is one of the commonly used computational tools available for that purpose. An alternative simple approach for validation of peptide assignments is based on addition of decoy (reversed, randomized, or shuffled) sequences to the searched protein sequence database. The probabilistic modeling approach of PeptideProphet and the decoy strategy can be combined within a single semisupervised framework, leading to improved robustness and higher accuracy of computed probabilities even in the case of most challenging data sets. We present a semisupervised expectation-maximization (EM) algorithm for constructing a Bayes classifier for peptide identification using the probability mixture model, extending PeptideProphet to incorporate decoy peptide matches. Using several data sets of varying complexity, from control protein mixtures to a human plasma sample, and using three commonly used database search programs, SEQUEST, MASCOT, and TANDEM/k-score, we illustrate that more accurate mixture estimation leads to an improved control of the false discovery rate in the classification of peptide assignments.

引用

页码：254 / 265

页数：12

共 43 条

[1] Mass spectrometry-based proteomics [J].

Aebersold, R ;

Mann, M .

NATURE, 2003, 422 (6928) :198-207

[2] A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: Support vector machine classification of peptide MS/MS spectra and SEQUEST scores [J].

Anderson, DC ;

Li, WQ ;

Payan, DG ;

Noble, WS .

JOURNAL OF PROTEOME RESEARCH, 2003, 2 (02) :137-146

[3] The need for guidelines in publication of peptide and protein identification data - Working group on publication guidelines for peptide and protein identification data [J].

Carr, S ;

Aebersold, R ;

Baldwin, M ;

Burlingame, A ;

Clauser, K ;

Nesvizhskii, A .

MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (06) :531-533

[4] False discovery rates and related statistical concepts in mass spectrometry-based proteomics [J].

Choi, Hyungwon ;

Nesvizhskii, Alexey I. .

JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :47-50

[5] Role of accurate mass measurement (±10 ppm) in protein identification strategies employing MS or MS MS and database searching [J].

Clauser, KR ;

Baker, P ;

Burlingame, AL .

ANALYTICAL CHEMISTRY, 1999, 71 (14) :2871-2882

[6] OLAV: Towards high-throughput tandem mass spectrometry data identification [J].

Colinge, J ;

Masselot, A ;

Giron, M ;

Dessingy, T ;

Magnin, J .

PROTEOMICS, 2003, 3 (08) :1454-1463

[7] TANDEM: matching proteins with tandem mass spectra [J].

Craig, R ;

Beavis, RC .

BIOINFORMATICS, 2004, 20 (09) :1466-1467

[8]

Desiere F, 2005, GENOME BIOL, V6

[9]

DING Y, UNPUB

[10] Large-scale simultaneous hypothesis testing: The choice of a null hypothesis [J].

Efron, B .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (465) :96-104

← 1 2 3 4 5 →