Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome

被引:94
作者
Brosch, Markus [1 ]
Saunders, Gary I. [1 ]
Frankish, Adam [1 ]
Collins, Mark O. [1 ]
Yu, Lu [1 ]
Wright, James [1 ]
Verstraten, Ruth [1 ]
Adams, David J. [1 ]
Harrow, Jennifer [1 ]
Choudhary, Jyoti S. [1 ]
Hubbard, Tim [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
基金
英国惠康基金;
关键词
POSTERIOR ERROR PROBABILITIES; MASS-SPECTROMETRY; PEPTIDE IDENTIFICATION; DROSOPHILA-MELANOGASTER; ANNOTATION; PREDICTION; DATABASE; SPECTRA; DUPLICATION; VALIDATION;
D O I
10.1101/gr.114272.110
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).
引用
收藏
页码:756 / 767
页数:12
相关论文
共 75 条
[61]   PROPOSAL FOR A COMMON NOMENCLATURE FOR SEQUENCE IONS IN MASS-SPECTRA OF PEPTIDES [J].
ROEPSTORFF, P ;
FOHLMAN, J .
BIOMEDICAL MASS SPECTROMETRY, 1984, 11 (11) :601-601
[62]   Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence [J].
Roest Crollius, H ;
Jaillon, O ;
Bernot, A ;
Dasilva, C ;
Bouneau, L ;
Fischer, C ;
Fizames, C ;
Wincker, P ;
Brottier, P ;
Quétier, F ;
Saurin, W ;
Weissenbach, J .
NATURE GENETICS, 2000, 25 (02) :235-238
[63]   Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using paired-end diTags (PETs) [J].
Ruan, Yijun ;
Ooi, Hong Sain ;
Choo, Siew Woh ;
Chiu, Kuo Ping ;
Zhao, Xiao Dong ;
Srinivasan, K. G. ;
Yao, Fei ;
Choo, Chiou Yu ;
Liu, Jun ;
Ariyaratne, Pramila ;
Bin, Wilson G. W. ;
Kuznetsov, Vladimir A. ;
Shahab, Atif ;
Sung, Wing-Kin ;
Bourque, Guillaume ;
Palanisamy, Nallasivam ;
Wei, Chia-Lin .
GENOME RESEARCH, 2007, 17 (06) :828-838
[64]  
SHASHIDHARAN P, 1994, J BIOL CHEM, V269, P16971
[65]   Gene prediction with a hidden Markov model and a new intron submodel [J].
Stanke, Mario ;
Waack, Stephan .
BIOINFORMATICS, 2003, 19 :II215-II225
[66]   Statistical significance for genomewide studies [J].
Storey, JD ;
Tibshirani, R .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (16) :9440-9445
[67]   Improving gene annotation using peptide mass spectrometry [J].
Tanner, Stephen ;
Shen, Zhouxin ;
Ng, Julio ;
Florea, Liliana ;
Guigo, Roderic ;
Briggs, Steven P. ;
Bafna, Vineet .
GENOME RESEARCH, 2007, 17 (02) :231-239
[68]   Proteomics studies confirm the presence of alternative protein isoforms on a large scale [J].
Tress, Michael L. ;
Bodenmiller, Bernd ;
Aebersold, Rudi ;
Valencia, Alfonso .
GENOME BIOLOGY, 2008, 9 (11)
[69]   Evolutionary fate of retroposed gene copies in the human genome [J].
Vinckenbosch, N ;
Dupanloup, I ;
Kaessmann, H .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (09) :3220-3225
[70]   Alternative isoform regulation in human tissue transcriptomes [J].
Wang, Eric T. ;
Sandberg, Rickard ;
Luo, Shujun ;
Khrebtukova, Irina ;
Zhang, Lu ;
Mayr, Christine ;
Kingsmore, Stephen F. ;
Schroth, Gary P. ;
Burge, Christopher B. .
NATURE, 2008, 456 (7221) :470-476