Proteome Coverage Prediction for Integrated Proteomics Datasets

被引:8
作者
Claassen, Manfred [1 ,2 ]
Aebersold, Ruedi [3 ]
Buhmann, Joachim M. [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Inst Mol Syst Biol, Zurich, Switzerland
[3] Ctr Syst Physiol & Metab Dis, Zurich, Switzerland
关键词
algorithms; computational molecular biology; TANDEM MASS-SPECTROMETRY; DIRICHLET PROCESSES; STATISTICAL-MODEL; IDENTIFICATIONS; DISTRIBUTIONS; MIXTURES; SEARCH;
D O I
10.1089/cmb.2010.0261
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Comprehensive characterization of a proteome defines a fundamental goal in proteomics. In order to maximize proteome coverage for a complex protein mixture, i.e., to identify as many proteins as possible, various different fractionation experiments are typically performed and the individual fractions are subjected to mass spectrometric analysis. The resulting data are integrated into large and heterogeneous datasets. Proteome coverage prediction refers to the task of extrapolating the number of protein discoveries by future measurements conditioned on a sequence of already performed measurements. Proteome coverage prediction at an early stage enables experimentalists to design and plan efficient proteomics studies. To date, there does not exist any method that reliably predicts proteome coverage from integrated datasets. We present a generalized hierarchical Pitman-Yor process model that explicitly captures the redundancy within integrated datasets. The accuracy of our approach for proteome coverage prediction is assessed by applying it to an integrated proteomics dataset for the bacterium L. interrogans. The proposed procedure outperforms ad hoc extrapolation methods and prediction methods designed for non-integrated datasets. Furthermore, the maximally achievable proteome coverage is estimated for the experimental setup underlying the L. interrogans dataset. We discuss the implications of our results for determining rational stop criteria and their influence on the design of efficient and reliable proteomics studies.
引用
收藏
页码:283 / 293
页数:11
相关论文
共 21 条
[1]
[Anonymous], 2005, R LANG ENV STAT COMP
[2]
MIXTURES OF DIRICHLET PROCESSES WITH APPLICATIONS TO BAYESIAN NONPARAMETRIC PROBLEMS [J].
ANTONIAK, CE .
ANNALS OF STATISTICS, 1974, 2 (06) :1152-1174
[3]
Beal MJ, 2002, ADV NEUR IN, V14, P577
[4]
FERGUSON DISTRIBUTIONS VIA POLYA URN SCHEMES [J].
BLACKWELL, D ;
MACQUEEN, JB .
ANNALS OF STATISTICS, 1973, 1 (02) :353-355
[5]
A high-quality catalog of the Drosophila melanogaster proteome [J].
Brunner, Erich ;
Ahrens, Christian H. ;
Mohanty, Sonali ;
Baetschmann, Hansruedi ;
Loevenich, Sandra ;
Potthast, Frank ;
Deutsch, Eric W. ;
Panse, Christian ;
de Lichtenberg, Ulrik ;
Rinner, Oliver ;
Lee, Hookeun ;
Pedrioli, Patrick G. A. ;
Malmstrom, Johan ;
Koehler, Katja ;
Schrimpf, Sabine ;
Krijgsveld, Jeroen ;
Kregenow, Floyd ;
Heck, Albert J. R. ;
Hafen, Ernst ;
Schlapbach, Ralph ;
Aebersold, Ruedi .
NATURE BIOTECHNOLOGY, 2007, 25 (05) :576-583
[6]
CLAASSEN M, 2010, P RECOMB SA IN PRESS
[7]
Proteome coverage prediction with infinite Markov models [J].
Claassen, Manfred ;
Aebersold, Ruedi ;
Buhmann, Joachim M. .
BIOINFORMATICS, 2009, 25 (12) :I154-I160
[8]
Review - Mass spectrometry and protein analysis [J].
Domon, B ;
Aebersold, R .
SCIENCE, 2006, 312 (5771) :212-217
[9]
Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214
[10]
Improving the success rate of proteome analysis by modeling protein-abundance distributions and experimental designs [J].
Eriksson, Jan ;
Fenyo, David .
NATURE BIOTECHNOLOGY, 2007, 25 (06) :651-655