Improving gene annotation using peptide mass spectrometry

被引:136
作者
Tanner, Stephen [1 ]
Shen, Zhouxin
Ng, Julio
Florea, Liliana
Guigo, Roderic
Briggs, Steven P.
Bafna, Vineet
机构
[1] Univ Calif San Diego, Bioinformat Program, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Dept Biol, La Jolla, CA 92093 USA
[3] George Washington Univ, Dept Comp Sci, Washington, DC 20052 USA
[4] Ctr Regual Genom, Barcelona 08003, Spain
[5] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
D O I
10.1101/gr.5646507
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Annotation of protein-coding genes is a key goal of genome sequencing projects. In spite of tremendous recent advances in computational gene finding, comprehensive annotation remains a challenge. Peptide mass spectrometry is a powerful tool for researching the dynamic proteome and suggests an attractive approach to discover and validate protein-coding genes. We present algorithms to construct and efficiently search spectra against a genomic database, with no prior knowledge of encoded proteins. By searching a corpus of 18.5 million tandem mass spectra (MS/MS) from human proteomic samples, we validate 39,000 exons and 11,000 introns at the level of translation. We present translation-level evidence for novel or extended exons in 16 genes, confirm translation of 224 hypothetical proteins, and discover or confirm over 40 alternative splicing events. Polymorphisms are efficiently encoded in our database, allowing us to observe variant alleles for 308 coding SNPs. Finally, we demonstrate the use of mass spectrometry to improve automated gene prediction, adding 800 correct exons to our predictions using a simple rescoring strategy. Our results demonstrate that proteomic profiling should play a role in any genome sequencing project.
引用
收藏
页码:231 / 239
页数:9
相关论文
共 47 条
[21]   Perspectives in spicing up proteomics with splicing [J].
Godovac-Zimmermann, J ;
Kleiner, O ;
Brown, LR ;
Drukier, AK .
PROTEOMICS, 2005, 5 (03) :699-709
[22]   Strengths and weaknesses of EST-based prediction of tissue-specific alternative splicing [J].
Gupta, S ;
Zink, D ;
Korn, B ;
Vingron, M ;
Haas, SA .
BMC GENOMICS, 2004, 5 (1)
[23]   A manually curated functional annotation of the human X chromosome [J].
Harsha, HC ;
Suresh, S ;
Amanchy, R ;
Deshpande, N ;
Shanker, K ;
Yatish, AJ ;
Muthusamy, B ;
Vrushabendra, BM ;
Rashmi, BP ;
Chandrika, KN ;
Padma, N ;
Sharma, S ;
Badano, JL ;
Ramya, MA ;
Shivashankar, HN ;
Peri, S ;
Choudhury, DR ;
Kavitha, MP ;
Saravana, R ;
Niranjan, V ;
Gandhi, TKB ;
Ghosh, N ;
Chandran, S ;
Menezes, M ;
Joy, M ;
Mohan, SS ;
Katsanis, N ;
Deshpande, KS ;
Raghothama, C ;
Prasad, CK ;
Pandey, A .
NATURE GENETICS, 2005, 37 (04) :331-332
[24]   Interpreting the protein language using proteomics [J].
Jensen, Ole N. .
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2006, 7 (06) :391-403
[25]   Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search [J].
Keller, A ;
Nesvizhskii, AI ;
Kolker, E ;
Aebersold, R .
ANALYTICAL CHEMISTRY, 2002, 74 (20) :5383-5392
[26]   The International Protein Index: An integrated database for proteomics experiments [J].
Kersey, PJ ;
Duarte, J ;
Williams, A ;
Karavidopoulou, Y ;
Birney, E ;
Apweiler, R .
PROTEOMICS, 2004, 4 (07) :1985-1988
[27]  
Küster B, 2001, PROTEOMICS, V1, P641
[28]   The alternative splicing gallery (ASG): bridging the gap between genome and transcriptome [J].
Leipzig, J ;
Pevzner, P ;
Heber, S .
NUCLEIC ACIDS RESEARCH, 2004, 32 (13) :3977-3983
[29]   Proteomic tools for quantitation by mass spectrometry [J].
Lill, J .
MASS SPECTROMETRY REVIEWS, 2003, 22 (03) :182-194
[30]   A suffix tree approach to the interpretation of tandem mass spectra: applications to peptides of non-specific digestion and post-translational modifications [J].
Lu, Bingwen ;
Chen, Ting .
BIOINFORMATICS, 2003, 19 :II113-II121