GutenTag: High-throughput sequence tagging via an empirically derived fragmentation model

被引:207
作者
Tabb, DL [1 ]
Saraf, A [1 ]
Yates, JR [1 ]
机构
[1] Scripps Res Inst, Dept Cell Biol, SR11, La Jolla, CA 92037 USA
关键词
D O I
10.1021/ac0347462
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Shotgun proteomics is a powerful tool for identifying the protein content of complex mixtures via liquid chromatography and tandem mass spectrometry. The most widely used class of algorithms for analyzing mass spectra of peptides has been database search software such as SEQUEST. A new sequence tag database search algorithm, called GutenTag, makes it possible to identify peptides with unknown posttranslational modifications or sequence variations. This software automates the process of inferring partial sequence "tags" directly from the spectrum and efficiently examines a sequence database for peptides that match these tags. When multiple candidate sequences result from the database search, the software evaluates which is the best match by a rapid examination of spectral fragment ions. We compare GutenTag's accuracy to that of SEQUEST on a defined protein mixture, showing that both modified and unmodified peptides can be successfully identified by this approach. GutenTag analyzed 33 000 spectra from a human lens sample, identifying peptides that were missed in prior SEQUEST analysis due to sequence polymorphisms and posttranslational modifications. The software is available under license; visit http://fields.scripps.edu for information.
引用
收藏
页码:6415 / 6421
页数:7
相关论文
共 35 条
[1]   EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH [J].
AHO, AV ;
CORASICK, MJ .
COMMUNICATIONS OF THE ACM, 1975, 18 (06) :333-340
[2]   The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer [J].
Baker, SG .
JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2003, 95 (07) :511-515
[3]   FAST ALGORITHM FOR PEPTIDE SEQUENCING BY MASS-SPECTROSCOPY [J].
BARTELS, C .
BIOMEDICAL AND ENVIRONMENTAL MASS SPECTROMETRY, 1990, 19 (06) :363-368
[4]  
Creasy DM, 2002, PROTEOMICS, V2, P1426, DOI 10.1002/1615-9861(200210)2:10<1426::AID-PROT1426>3.0.CO
[5]  
2-5
[6]   De novo peptide sequencing via tandem mass spectrometry [J].
Dancík, V ;
Addona, TA ;
Clauser, KR ;
Vath, JE ;
Pevzner, PA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :327-342
[7]  
ENG JK, 1995, J AM SOC MASS SPECTR, V67, P1426
[8]  
Fernandez-de-Cossio J, 2000, ELECTROPHORESIS, V21, P1694, DOI 10.1002/(SICI)1522-2683(20000501)21:9<1694::AID-ELPS1694>3.0.CO
[9]  
2-W
[10]   CASEINS OF VARIOUS ORIGINS AND BIOLOGICALLY-ACTIVE CASEIN PEPTIDES AND OLIGOSACCHARIDES - STRUCTURAL AND PHYSIOLOGICAL-ASPECTS [J].
FIAT, AM ;
JOLLES, P .
MOLECULAR AND CELLULAR BIOCHEMISTRY, 1989, 87 (01) :5-30