Proteogenomic mapping as a complementary method to perform genome annotation

被引:266
作者
Jaffe, JD
Berg, HC
Church, GM
机构
[1] Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
[2] Harvard Univ, Dept Mol & Cellular Biol, Cambridge, MA 02138 USA
[3] Rowland Inst Sci Inc, Cambridge, MA 02142 USA
关键词
mass spectrometry; mycoplasma pneumoniae; open reading frame determination; proteogenomic mapping;
D O I
10.1002/pmic.200300511
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The accelerated rate of genomic sequencing has led to an abundance of completely sequenced genomes. Annotation of the open reading frames (ORFs) (i.e., gene prediction) in these genomes is an important task and is most often performed computationally based on features in the nucleic acid sequence. Using recent advances in proteomics, we set out to predict the set of ORFs for an organism based principally on expressed protein-based evidence. Using a novel search strategy, we mapped peptides detected in a whole-cell lysate of Mycoplasma pneumoniae onto a genomic scaffold and extended these "hits" into ORFs bound by traditional genetic signals to generate a "proteogenomic map". We were able to generate an ORF model for M. pneumoniae strain FH using proteomic data with a high correlation to models based on sequence features. Ultimately, we detected over 81% of the genomically predicted ORFs in M. pneumoniae strain M129 (the originally sequenced strain). We were also able to detect several new ORFs not originally predicted by genomic methods, various N-terminal extensions, and some evidence that would suggest that certain predicted ORFs are bogus. Some of these differences may be a result of the strain analyzed but demonstrate the robustness of protein analysis across closely related genomes. This technique is a cost-effective means to add value to genome annotation, and a prerequisite for proteome quantitation and in vivo interaction measures.
引用
收藏
页码:59 / 77
页数:19
相关论文
共 27 条
[1]   RAPID CDNA SEQUENCING (EXPRESSED SEQUENCE TAGS) FROM A DIRECTIONALLY CLONED HUMAN INFANT BRAIN CDNA LIBRARY [J].
ADAMS, MD ;
SOARES, MB ;
KERLAVAGE, AR ;
FIELDS, C ;
VENTER, JC .
NATURE GENETICS, 1993, 4 (04) :373-386
[2]   Crystal structure of HPr kinase/phosphatase from Mycoplasma pneumoniae [J].
Allen, GS ;
Steinhauer, K ;
Hillen, W ;
Stülke, J ;
Brennan, RG .
JOURNAL OF MOLECULAR BIOLOGY, 2003, 326 (04) :1203-1217
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]  
ALUOTTO B B, 1970, International Journal of Systematic Bacteriology, V20, P35
[5]   Re-annotating the Mycoplasma pneumoniae genome sequence:: adding value, function and reading frames [J].
Dandekar, T ;
Huynen, M ;
Regula, JT ;
Ueberle, B ;
Zimmermann, CU ;
Andrade, MA ;
Doerks, T ;
Sánchez-Pulido, L ;
Snel, B ;
Suyama, M ;
Yuan, YP ;
Herrmann, R ;
Bork, P .
NUCLEIC ACIDS RESEARCH, 2000, 28 (17) :3278-3288
[6]   Improved microbial gene identification with GLIMMER [J].
Delcher, AL ;
Harmon, D ;
Kasif, S ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (23) :4636-4641
[7]   PHOSPHORYLATION OF CYTADHERENCE-ACCESSORY PROTEINS IN MYCOPLASMA-PNEUMONIAE [J].
DIRKSEN, LB ;
KREBES, KA ;
KRAUSE, DC .
JOURNAL OF BACTERIOLOGY, 1994, 176 (24) :7499-7505
[8]   Mycoplasma pneumoniae P1 type 1-and type 2-specific sequences within the P1 cytadhesin gene of individual strains [J].
Dorigo-Zetsma, JW ;
Wilbrink, B ;
Dankert, J ;
Zaat, SAJ .
INFECTION AND IMMUNITY, 2001, 69 (09) :5612-5618
[9]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[10]  
GYGI MP, 2002, PROTEIN ANAL LAB MAN