Clustering millions of tandem mass spectra

被引:211
作者
Frank, Ari M. [1 ]
Bandeira, Nuno [1 ]
Shen, Zhouxin [2 ]
Tanner, Stephen [3 ]
Briggs, Steven P. [2 ]
Smith, Richard D. [4 ]
Pevzner, Pavel A. [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Dept Biol, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Bioinformat Program, La Jolla, CA 92093 USA
[4] Pacific NW Natl Lab, Biol Sci Div, Richland, WA 99352 USA
关键词
clustering; MS/MS; database search; spectral archives; spectral libraries;
D O I
10.1021/pr070361e
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec.
引用
收藏
页码:113 / 122
页数:10
相关论文
共 38 条
[1]  
[Anonymous], CLUSTER ANAL
[2]   Shotgun protein sequencing - Assembly of peptide tandem mass spectra from mixtures of modified proteins [J].
Bandeira, Nuno ;
Clauser, Karl R. ;
Pevzner, Pavel A. .
MOLECULAR & CELLULAR PROTEOMICS, 2007, 6 (07) :1123-1134
[3]   Protein identification by spectral networks analysis [J].
Bandeira, Nuno ;
Tsur, Dekel ;
Frank, Ari ;
Pevzner, Pavel A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (15) :6140-6145
[4]   Improving large-scale proteomics by clustering of mass spectrometry data [J].
Beer, I ;
Barnea, E ;
Ziv, T ;
Admon, A .
PROTEOMICS, 2004, 4 (04) :950-960
[5]   Automatic Quality Assessment of Peptide Tandem Mass Spectra [J].
Bern, Marshall ;
Goldberg, David ;
McDonald, W. Hayes ;
Yates, John R., III .
BIOINFORMATICS, 2004, 20 :49-54
[6]   Comprehensive proteornics in yeast using chromatographic fractionation, gas phase fractionation, protein gel electrophoresis, and isoelectric focusing [J].
Breci, L ;
Hattrup, E ;
Keeler, M ;
Letarte, J ;
Johnson, R ;
Haynes, PA .
PROTEOMICS, 2005, 5 (08) :2018-2028
[7]   Using annotated peptide mass spectrum libraries for protein identification [J].
Craig, R. ;
Cortens, J. C. ;
Fenyo, D. ;
Beavis, R. C. .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (08) :1843-1849
[8]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[9]   Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search [J].
Dutta, Debojyoti ;
Chen, Ting .
BIOINFORMATICS, 2007, 23 (05) :612-618
[10]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989