Proteogenomics to discover the full coding content of genomes: A computational perspective

被引:124
作者
Castellana, Natalie [1 ]
Bafna, Vineet [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
基金
美国国家科学基金会;
关键词
Proteogenomics; Mass spectrometry; Gene annotation; TANDEM MASS-SPECTRA; HIDDEN MARKOV MODEL; POSTTRANSLATIONAL MODIFICATIONS; PROTEIN IDENTIFICATION; STATISTICAL-MODEL; SCORING FUNCTION; SIGNAL PEPTIDES; GENE PREDICTION; SPECTROMETRY; SEQUENCE;
D O I
10.1016/j.jprot.2010.06.007
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Proteogenomics has emerged as a field at the junction of genomics and proteomics. It is a loose collection of technologies that allow the search of tandem mass spectra against genomic databases to identify and characterize protein-coding genes. Proteogenomic peptides provide invaluable information for gene annotation, which is difficult or impossible to ascertain using standard annotation methods. Examples include confirmation of translation, reading-frame determination, identification of gene and exon boundaries, evidence for post-translational processing, identification of splice-forms including alternative splicing, and also, prediction of completely novel genes. For proteogenomics to deliver on its promise, however, it must overcome a number of technological hurdles, including speed and accuracy of peptide identification, construction and search of specialized databases, correction of sampling bias, and others. This article reviews the state of the art of the field, focusing on the current successes, and the role of computation in overcoming these challenges. We describe how technological and algorithmic advances have already enabled large-scale proteogenomic studies in many model organisms, including arabidopsis, yeast, fly, and human. We also provide a preview of the field going forward, describing early efforts in tackling the problems of complex gene structures, searching against genomes of related species, and immunoglobulin gene reconstruction. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:2124 / 2135
页数:12
相关论文
共 97 条
[1]   The genome sequence of Drosophila melanogaster [J].
Adams, MD ;
Celniker, SE ;
Holt, RA ;
Evans, CA ;
Gocayne, JD ;
Amanatides, PG ;
Scherer, SE ;
Li, PW ;
Hoskins, RA ;
Galle, RF ;
George, RA ;
Lewis, SE ;
Richards, S ;
Ashburner, M ;
Henderson, SN ;
Sutton, GG ;
Wortman, JR ;
Yandell, MD ;
Zhang, Q ;
Chen, LX ;
Brandon, RC ;
Rogers, YHC ;
Blazej, RG ;
Champe, M ;
Pfeiffer, BD ;
Wan, KH ;
Doyle, C ;
Baxter, EG ;
Helt, G ;
Nelson, CR ;
Miklos, GLG ;
Abril, JF ;
Agbayani, A ;
An, HJ ;
Andrews-Pfannkoch, C ;
Baldwin, D ;
Ballew, RM ;
Basu, A ;
Baxendale, J ;
Bayraktaroglu, L ;
Beasley, EM ;
Beeson, KY ;
Benos, PV ;
Berman, BP ;
Bhandari, D ;
Bolshakov, S ;
Borkova, D ;
Botchan, MR ;
Bouck, J ;
Brokstein, P .
SCIENCE, 2000, 287 (5461) :2185-2195
[2]   COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT [J].
ADAMS, MD ;
KELLEY, JM ;
GOCAYNE, JD ;
DUBNICK, M ;
POLYMEROPOULOS, MH ;
XIAO, H ;
MERRIL, CR ;
WU, A ;
OLDE, B ;
MORENO, RF ;
KERLAVAGE, AR ;
MCCOMBIE, WR ;
VENTER, JC .
SCIENCE, 1991, 252 (5013) :1651-1656
[3]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[4]   Statistical characterization of a 1D random potential problem-With applications in score statistics of MS-based peptide sequencing [J].
Alves, Gelio ;
Yu, Yi-Kuo .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2008, 387 (26) :6538-6544
[5]  
[Anonymous], The 1000 genome project
[6]  
Ansong Charles, 2008, Briefings in Functional Genomics & Proteomics, V7, P50, DOI 10.1093/bfgp/eln010
[7]   The Universal Protein Resource (UniProt) in 2010 [J].
Apweiler, Rolf ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Antunes, Ricardo ;
Barrell, Daniel ;
Bely, Benoit ;
Bingley, Mark ;
Binns, David ;
Bower, Lawrence ;
Browne, Paul ;
Chan, Wei Mun ;
Dimmer, Emily ;
Eberhardt, Ruth ;
Fedotov, Alexander ;
Foulger, Rebecca ;
Garavelli, John ;
Huntley, Rachael ;
Jacobsen, Julius ;
Kleen, Michael ;
Laiho, Kati ;
Leinonen, Rasko ;
Legge, Duncan ;
Lin, Quan ;
Liu, Wudong ;
Luo, Jie ;
Orchard, Sandra ;
Patient, Samuel ;
Poggioli, Diego ;
Pruess, Manuela ;
Corbett, Matt ;
di Martino, Giuseppe ;
Donnelly, Mike ;
van Rensburg, Pieter ;
Bairoch, Amos ;
Bougueleret, Lydie ;
Xenarios, Ioannis ;
Altairac, Severine ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D142-D148
[8]   Interpreting sequences from mastodon and T-rex [J].
Asara, John M. ;
Garavelli, John S. ;
Slatter, David A. ;
Schweitzer, Mary H. ;
Freimark, Lisa M. ;
Phillips, Matthew ;
Cantley, Lewis C. .
SCIENCE, 2007, 317 (5843) :1324-1325
[9]   Protein sequences from mastodon and Tyrannosaurus rex revealed by mass spectrometry [J].
Asara, John M. ;
Schweitzer, Mary H. ;
Freimark, Lisa M. ;
Phillips, Matthew ;
Cantley, Lewis C. .
SCIENCE, 2007, 316 (5822) :280-285
[10]   Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics [J].
Baerenfaller, Katja ;
Grossmann, Jonas ;
Grobei, Monica A. ;
Hull, Roger ;
Hirsch-Hoffmann, Matthias ;
Yalovsky, Shaul ;
Zimmermann, Philip ;
Grossniklaus, Ueli ;
Gruissem, Wilhelm ;
Baginsky, Sacha .
SCIENCE, 2008, 320 (5878) :938-941