Clustering millions of tandem mass spectra

被引:211
作者
Frank, Ari M. [1 ]
Bandeira, Nuno [1 ]
Shen, Zhouxin [2 ]
Tanner, Stephen [3 ]
Briggs, Steven P. [2 ]
Smith, Richard D. [4 ]
Pevzner, Pavel A. [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Dept Biol, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Bioinformat Program, La Jolla, CA 92093 USA
[4] Pacific NW Natl Lab, Biol Sci Div, Richland, WA 99352 USA
关键词
clustering; MS/MS; database search; spectral archives; spectral libraries;
D O I
10.1021/pr070361e
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec.
引用
收藏
页码:113 / 122
页数:10
相关论文
共 38 条
[11]  
FEWEN FB, 2006, ANAL CHEM, V78, P5678
[12]   Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering [J].
Flikka, K ;
Martens, L ;
Vandekerckhoe, J ;
Gevaert, K ;
Eidhammer, I .
PROTEOMICS, 2006, 6 (07) :2086-2094
[13]   Peptide sequence tags for fast database search in mass-spectrometry [J].
Frank, A ;
Tanner, S ;
Bafna, V ;
Pevzner, P .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (04) :1287-1295
[14]   Open mass spectrometry search algorithm [J].
Geer, LY ;
Markey, SP ;
Kowalak, JA ;
Wagner, L ;
Xu, M ;
Maynard, DM ;
Yang, XY ;
Shi, WY ;
Bryant, SH .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (05) :958-964
[15]   Whole proteome analysis of post-translational modifications: Applications of mass-spectrometry for proteogenomic annotation [J].
Gupta, Nitin ;
Tanner, Stephen ;
Jaitly, Navdeep ;
Adkins, Joshua N. ;
Lipton, Mary ;
Edwards, Robert ;
Romine, Margaret ;
Osterman, Andrei ;
Bafna, Vineet ;
Smith, Richard D. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2007, 17 (09) :1362-1377
[16]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323
[17]   Development and validation of a spectral library searching method for peptide identification from MS/MS [J].
Lam, Henry ;
Deutsch, Eric W. ;
Eddes, James S. ;
Eng, Jimmy K. ;
King, Nichole ;
Stein, Stephen E. ;
Aebersold, Ruedi .
PROTEOMICS, 2007, 7 (05) :655-667
[18]   Methods for peptide identification by spectral comparison [J].
Liu, Jian ;
Bell, Alexander W. ;
Bergeron, John J. M. ;
Yanofsky, Corey M. ;
Carrillo, Brian ;
Beaudrie, Christian E. H. ;
Kearney, Robert E. .
PROTEOME SCIENCE, 2007, 5 (1)
[19]   ERROR TOLERANT IDENTIFICATION OF PEPTIDES IN SEQUENCE DATABASES BY PEPTIDE SEQUENCE TAGS [J].
MANN, M ;
WILM, M .
ANALYTICAL CHEMISTRY, 1994, 66 (24) :4390-4399
[20]   Targeted comparative proteomics by liquid chromatography-tandem Fourier ion cyclotron resonance mass spectrometry [J].
Masselon, C ;
Pasa-Tolic, L ;
Tolic, N ;
Anderson, GA ;
Bogdanov, B ;
Vilkov, AN ;
Shen, YF ;
Zhao, R ;
Oian, WJ ;
Lipton, MS ;
Camp, DG ;
Smith, RD .
ANALYTICAL CHEMISTRY, 2005, 77 (02) :400-406