A proteogenomic update to Yersinia: enhancing genome annotation

被引:42
作者
Payne, Samuel H. [1 ]
Huang, Shih-Ting [1 ]
Pieper, Rembert [1 ]
机构
[1] J Craig Venter Inst, Rockville, MD 20850 USA
来源
BMC GENOMICS | 2010年 / 11卷
基金
美国国家卫生研究院;
关键词
MASS-SPECTROMETRY; SEQUENCE; PROTEOME; PEPTIDES; IDENTIFICATION; TEMPERATURE; PROTEINS; GENE;
D O I
10.1186/1471-2164-11-460
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Modern biomedical research depends on a complete and accurate proteome. With the widespread adoption of new sequencing technologies, genome sequences are generated at a near exponential rate, diminishing the time and effort that can be invested in genome annotation. The resulting gene set contains numerous errors in even the most basic form of annotation: the primary structure of the proteins. Results: The application of experimental proteomics data to genome annotation, called proteogenomics, can quickly and efficiently discover misannotations, yielding a more accurate and complete genome annotation. We present a comprehensive proteogenomic analysis of the plague bacterium, Yersinia pestis KIM. We discover non-annotated genes, correct protein boundaries, remove spuriously annotated ORFs, and make major advances towards accurate identification of signal peptides. Finally, we apply our data to 21 other Yersinia genomes, correcting and enhancing their annotations. Conclusions: In total, 141 gene models were altered and have been updated in RefSeq and Genbank, which can be accessed seamlessly through any NCBI tool (e. g. blast) or downloaded directly. Along with the improved gene models we discover new, more accurate means of identifying signal peptides in proteomics data.
引用
收藏
页数:10
相关论文
共 29 条
[11]   Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214
[12]   A Ranking-Based Scoring Function for Peptide-Spectrum Matches [J].
Frank, Ari M. .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (05) :2241-2252
[13]   Whole-genome analysis: annotations and updates [J].
Gaasterland, T ;
Oprea, M .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2001, 11 (03) :377-381
[14]   Ortho-proteogenomics: Multiple proteomes investigation through orthology and a new MS-based protocol [J].
Gallien, Sebastien ;
Perrodou, Emmanuel ;
Carapito, Christine ;
Deshayes, Caroline ;
Reyrat, Jean-Marc ;
Van Dorsselaer, Alain ;
Poch, Olivier ;
Schaeffer, Christine ;
Lecompte, Odile .
GENOME RESEARCH, 2009, 19 (01) :128-135
[15]   Whole proteome analysis of post-translational modifications: Applications of mass-spectrometry for proteogenomic annotation [J].
Gupta, Nitin ;
Tanner, Stephen ;
Jaitly, Navdeep ;
Adkins, Joshua N. ;
Lipton, Mary ;
Edwards, Robert ;
Romine, Margaret ;
Osterman, Andrei ;
Bafna, Vineet ;
Smith, Richard D. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2007, 17 (09) :1362-1377
[16]   The complete genome and proteome of Mycoplasma mobile [J].
Jaffe, JD ;
Stange-Thomann, N ;
Smith, C ;
DeCaprio, D ;
Fisher, S ;
Butler, J ;
Calvo, S ;
Elkins, T ;
Fitzgerald, MG ;
Hafez, N ;
Kodira, CD ;
Major, J ;
Wang, SG ;
Wilkinson, J ;
Nicol, R ;
Nusbaum, C ;
Birren, B ;
Berg, HC ;
Church, GM .
GENOME RESEARCH, 2004, 14 (08) :1447-1461
[17]   Proteomics reveals open reading frames in Mycobacterium tuberculosis H37Rv not predicted by genomics [J].
Jungblut, PR ;
Müller, EC ;
Mattow, J ;
Kaufmann, SHE .
INFECTION AND IMMUNITY, 2001, 69 (09) :5905-5907
[18]   Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12 [J].
Link, AJ ;
Robison, K ;
Church, GM .
ELECTROPHORESIS, 1997, 18 (08) :1259-1313
[19]   Use of mass spectrometry-derived data to annotate nucleotide and protein sequence databases [J].
Mann, M ;
Pandey, A .
TRENDS IN BIOCHEMICAL SCIENCES, 2001, 26 (01) :54-61
[20]  
Ouzounis CA, 2002, GENOME BIOL, V3