AceView: a comprehensive cDNA-supported gene and transcripts annotation

被引:461
作者
Thierry-Mieg, Danielle [1 ]
Thierry-Mieg, Jean [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
关键词
D O I
10.1186/gb-2006-7-s1-s12
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Regions covering one percent of the genome, selected by ENCODE for extensive analysis, were annotated by the HAVANA/Gencode group with high quality transcripts, thus defining a benchmark. The ENCODE Genome Annotation Assessment Project (EGASP) competition aimed at reproducing Gencode and finding new genes. The organizers evaluated the protein predictions in depth. We present a complementary analysis of the mRNAs, including alternative transcript variants. Results: We evaluate 25 gene tracks from the University of California Santa Cruz (UCSC) genome browser. We either distinguish or collapse the alternative splice variants, and compare the genomic coordinates of exons, introns and nucleotides. Whole mRNA models, seen as chains of introns, are sorted to find the best matching pairs, and compared so that each mRNA is used only once. At the mRNA level, AceView is by far the closest to Gencode: the vast majority of transcripts of the two methods, including alternative variants, are identical. At the protein level, however, due to a lack of experimental data, our predictions differ: Gencode annotates proteins in only 41% of the mRNAs whereas AceView does so in virtually all. We describe the driving principles of AceView, and how, by performing hand-supervised automatic annotation, we solve the combinatorial splicing problem and summarize all of GenBank, dbEST and RefSeq into a genome-wide non-redundant but comprehensive cDNA- supported transcriptome. AceView accuracy is now validated by Gencode. Conclusions: Relative to a consensus mRNA catalog constructed from all evidence-based annotations, Gencode and AceView have 81% and 84% sensitivity, and 74% and 73% specificity, respectively. This close agreement validates a richer view of the human transcriptome, with three to five times more transcripts than in UCSC Known Genes (sensitivity 28%), RefSeq (sensitivity 21%) or Ensembl (sensitivity 19%).
引用
收藏
页数:14
相关论文
共 24 条
[1]   Redundancy of non-AUG initiators - A clever mechanism to enhance the efficiency of translation in yeast [J].
Chang, KJ ;
Lin, G ;
Men, LC ;
Wang, CC .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2006, 281 (12) :7775-7783
[2]   Finishing the euchromatic sequence of the human genome [J].
Collins, FS ;
Lander, ES ;
Rogers, J ;
Waterston, RH .
NATURE, 2004, 431 (7011) :931-945
[3]   The ENCODE (ENCyclopedia of DNA elements) Project [J].
Feingold, EA ;
Good, PJ ;
Guyer, MS ;
Kamholz, S ;
Liefer, L ;
Wetterstrand, K ;
Collins, FS ;
Gingeras, TR ;
Kampa, D ;
Sekinger, EA ;
Cheng, J ;
Hirsch, H ;
Ghosh, S ;
Zhu, Z ;
Pate, S ;
Piccolboni, A ;
Yang, A ;
Tammana, H ;
Bekiranov, S ;
Kapranov, P ;
Harrison, R ;
Church, G ;
Struhl, K ;
Ren, B ;
Kim, TH ;
Barrera, LO ;
Qu, C ;
Van Calcar, S ;
Luna, R ;
Glass, CK ;
Rosenfeld, MG ;
Guigo, R ;
Antonarakis, SE ;
Birney, E ;
Brent, M ;
Pachter, L ;
Reymond, A ;
Dermitzakis, ET ;
Dewey, C ;
Keefe, D ;
Denoeud, F ;
Lagarde, J ;
Ashurst, J ;
Hubbard, T ;
Wesselink, JJ ;
Castelo, R ;
Eyras, E ;
Myers, RM ;
Sidow, A ;
Batzoglou, S .
SCIENCE, 2004, 306 (5696) :636-640
[4]   EGASP:: collaboration through competition to find human genes [J].
Guigó, R ;
Reese, MG .
NATURE METHODS, 2005, 2 (08) :575-577
[5]   EGASP:: the human ENCODE genome annotation assessment project [J].
Guigo, Roderic ;
Flicek, Paul ;
Abril, Josep F. ;
Reymond, Alexandre ;
Lagarde, Julien ;
Denoeud, France ;
Antonarakis, Stylianos ;
Ashburner, Michael ;
Bajic, Vladimir B. ;
Birney, Ewan ;
Castelo, Robert ;
Eyras, Eduardo ;
Ucla, Catherine ;
Gingeras, Thomas R. ;
Harrow, Jennifer ;
Hubbard, Tim ;
Lewis, Suzanna E. ;
Reese, Martin G. .
GENOME BIOLOGY, 2006, 7 (Suppl 1)
[6]   Genomics in C-elegans:: So many genes, such a little worm [J].
Hillier, LW ;
Coulson, A ;
Murray, JI ;
Bao, ZR ;
Sulston, JE ;
Waterston, RH .
GENOME RESEARCH, 2005, 15 (12) :1651-1660
[7]   The UCSC Genome Browser Database: update 2006 [J].
Hinrichs, A. S. ;
Karolchik, D. ;
Baertsch, R. ;
Barber, G. P. ;
Bejerano, G. ;
Clawson, H. ;
Diekhans, M. ;
Furey, T. S. ;
Harte, R. A. ;
Hsu, F. ;
Hillman-Jackson, J. ;
Kuhn, R. M. ;
Pedersen, J. S. ;
Pohl, A. ;
Raney, B. J. ;
Rosenbloom, K. R. ;
Siepel, A. ;
Smith, K. E. ;
Sugnet, C. W. ;
Sultan-Qurraie, A. ;
Thomas, D. J. ;
Trumbower, H. ;
Weber, R. J. ;
Weirauch, M. ;
Zweig, A. S. ;
Haussler, D. ;
Kent, W. J. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D590-D598
[8]  
Jacobs D, 1998, GENETICS, V149, P1809
[9]   Diversification of transcriptional modulation: Large-scale identification and characterization of putative alternative promoters of human genes [J].
Kimura, K ;
Wakamatsu, A ;
Suzuki, Y ;
Ota, T ;
Nishikawa, T ;
Yamashita, R ;
Yamamoto, J ;
Sekine, M ;
Tsuritani, K ;
Wakaguri, H ;
Ishii, S ;
Sugiyama, T ;
Saito, K ;
Isono, Y ;
Irie, R ;
Kushida, N ;
Yoneyama, T ;
Otsuka, R ;
Kanda, K ;
Yokoi, T ;
Kondo, H ;
Wagatsuma, M ;
Murakawa, K ;
Ishida, S ;
Ishibashi, T ;
Takahashi-Fujii, A ;
Tanase, T ;
Nagai, K ;
Kikuchi, H ;
Nakai, K ;
Isogai, T ;
Sugano, S .
GENOME RESEARCH, 2006, 16 (01) :55-65
[10]   Pushing the limits of the scanning mechanism for initiation of translation [J].
Kozak, M .
GENE, 2002, 299 (1-2) :1-34