Implementation and application of a versatile clustering tool for tandem mass spectrometry data

被引:23
作者
Flikka, Kristian
Meukens, Jeroen
Helsensi, Kenny
Vandekerckhove, Joel
Eidhammer, Ingvar
Gevaert, Kris
Martens, Lennart
机构
[1] Univ Bergen, Bergen Ctr Computat Sci, Computat Biol Unit, N-5008 Bergen, Norway
[2] Univ Bergen, Proteom Unit, Bergen, Norway
[3] Univ Bergen, Dept Informat, N-5008 Bergen, Norway
[4] VIB, Dept Med Prot Res, Ghent, Belgium
[5] Univ Ghent, Dept Biochem, Ghent, Belgium
关键词
Bioinformatics; mass spectrometry; spectrum clustering;
D O I
10.1002/pmic.200700160
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput proteomics experiments typically generate large amounts of peptide fragmentation mass spectra during a single experiment. There is often a substantial amount of redundant fragmentation of the same precursors among these spectra, which is usually considered a nuisance. We here discuss the potential of clustering and merging redundant spectra to turn this redundancy into a useful property of the dataset. To this end, we have created the first general-purpose, freely available open-source software application for clustering and merging MS/MS spectra. The application also introduces a novel approach to calculating the similarity of fragmentation mass spectra that takes into account the increased precision of modem mass spectrometers, and we suggest a simple but effective improvement to single-linkage clustering. The application and the novel algorithms are applied to several real-life proteomic datasets and the results are discussed. An analysis of the influence of the different algorithms available and their parameters is given, as well as a number of important applications of the overall approach.
引用
收藏
页码:3245 / 3258
页数:14
相关论文
共 37 条
[1]   Large-scale identification of N-terminal peptides in the halophilic archaea Halobacterium salinarum and Natronomonas pharaonis [J].
Aivaliotis, Michalis ;
Gevaert, Kris ;
Falb, Michaela ;
Tebbe, Andreas ;
Konstantinidis, Kosta ;
Bisle, Birgit ;
Klein, Christian ;
Martens, Lennart ;
Staes, An ;
Timmerman, Evy ;
Van Damme, Jozef ;
Siedler, Frank ;
Pfeiffer, Friedhelm ;
Vandekerckhove, Joel ;
Oesterhelt, Dieter .
JOURNAL OF PROTEOME RESEARCH, 2007, 6 (06) :2195-2204
[2]   Molecular classification of borderline ovarian tumors using hierarchical cluster analysis of protein expression profiles [J].
Alaiya, AA ;
Franzén, B ;
Hagman, A ;
Dysvik, B ;
Roblick, UJ ;
Becker, S ;
Moberger, B ;
Auer, G ;
Linder, S .
INTERNATIONAL JOURNAL OF CANCER, 2002, 98 (06) :895-899
[3]   Shotgun protein sequencing by tandem mass spectra assembly [J].
Bandeira, N ;
Tang, HX ;
Bafna, V ;
Pevzner, P .
ANALYTICAL CHEMISTRY, 2004, 76 (24) :7221-7233
[4]   Centralized data analysis of a large interlaboratory proteomics project: A feasibility study [J].
Beer, I ;
Barnea, E ;
Admon, A .
PROTEOMICS, 2005, 5 (13) :3491-3496
[5]   Improving large-scale proteomics by clustering of mass spectrometry data [J].
Beer, I ;
Barnea, E ;
Ziv, T ;
Admon, A .
PROTEOMICS, 2004, 4 (04) :950-960
[6]   Using annotated peptide mass spectrum libraries for protein identification [J].
Craig, R. ;
Cortens, J. C. ;
Fenyo, D. ;
Beavis, R. C. .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (08) :1843-1849
[7]   The use of proteotypic peptide libraries for protein identification [J].
Craig, R ;
Cortens, JP ;
Beavis, RC .
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2005, 19 (13) :1844-1850
[8]  
Creasy DM, 2002, PROTEOMICS, V2, P1426, DOI 10.1002/1615-9861(200210)2:10<1426::AID-PROT1426>3.0.CO
[9]  
2-5
[10]  
Desiere F, 2005, GENOME BIOL, V6