EAnnot: A genome annotation tool using experimental evidence

被引:12
作者
Ding, L [1 ]
Sabo, A [1 ]
Berkowicz, N [1 ]
Meyer, RR [1 ]
Shotland, Y [1 ]
Johnson, MR [1 ]
Pepin, KH [1 ]
Wilson, RK [1 ]
Spieth, J [1 ]
机构
[1] Washington Univ, Sch Med, Genome Sequencing Ctr, St Louis, MO 63110 USA
关键词
D O I
10.1101/gr.3152604
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The sequence of any genome becomes most useful for biological experimentation when a complete and accurate gene set is available. Gene prediction programs offer an efficient way to generate an automated gene set. Manual annotation, when performed by experienced annotators, is more accurate and complete than automated annotation. However, it is a laborious and expensive process, and by its nature, introduces a degree of variability not found with automated annotation. EAnnot (Electronic Annotation) is a program originally developed for manually annotating the human genome. It combines the latest bioinformatics tools to extract and analyze a wide range of publicly available data in order to achieve fast and reliable automatic gene prediction and annotation. EAnnot builds gene models based on mRNA, EST, and protein alignments to genomic sequence, attaches supporting evidence to the corresponding genes, identifies pseudogenes, and locates poly(A) sites and signals. Here, we compare manual annotation of human chromosome 6 with annotation performed by EAnnot in order to assess the latter's accuracy. EAnnot can readily be applied to manual annotation of other eukaryotic genomes and can be used to rapidly obtain an automated gene set.
引用
收藏
页码:2503 / 2509
页数:7
相关论文
共 30 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   LIFEdb: a database for functional genomics experiments integrating information from external sources, and serving as a sample tracking system [J].
Bannasch, D ;
Mehrle, A ;
Glatting, KH ;
Pepperkok, R ;
Poustka, A ;
Wiemann, S .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D505-D508
[3]   Patterns of variant polyadenylation signal usage in human genes [J].
Beaudoing, E ;
Freier, S ;
Wyatt, JR ;
Claverie, JM ;
Gautheret, D .
GENOME RESEARCH, 2000, 10 (07) :1001-1010
[4]   An overview of ensembl [J].
Birney, E ;
Andrews, TD ;
Bevan, P ;
Caccamo, M ;
Chen, Y ;
Clarke, L ;
Coates, G ;
Cuff, J ;
Curwen, V ;
Cutts, T ;
Down, T ;
Eyras, E ;
Fernandez-Suarez, XM ;
Gane, P ;
Gibbins, B ;
Gilbert, J ;
Hammond, M ;
Hotz, HR ;
Iyer, V ;
Jekosch, K ;
Kahari, A ;
Kasprzyk, A ;
Keefe, D ;
Keenan, S ;
Lehvaslaiho, H ;
McVicker, G ;
Melsopp, C ;
Meidl, P ;
Mongin, E ;
Pettett, R ;
Potter, S ;
Proctor, G ;
Rae, M ;
Searle, S ;
Slater, G ;
Smedley, D ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Storey, R ;
Ureta-Vidal, A ;
Woodwark, KC ;
Cameron, G ;
Durbin, R ;
Cox, A ;
Hubbard, T ;
Clamp, M .
GENOME RESEARCH, 2004, 14 (05) :925-928
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[6]   Ensembl 2002: accommodating comparative genomics [J].
Clamp, M ;
Andrews, D ;
Barker, D ;
Bevan, P ;
Cameron, G ;
Chen, Y ;
Clark, L ;
Cox, T ;
Cuff, J ;
Curwen, V ;
Down, T ;
Durbin, R ;
Eyras, E ;
Gilbert, J ;
Hammond, M ;
Hubbard, T ;
Kasprzyk, A ;
Keefe, D ;
Lehvaslaiho, H ;
Iyer, V ;
Melsopp, C ;
Mongin, E ;
Pettett, R ;
Potter, S ;
Rust, A ;
Schmidt, E ;
Searle, S ;
Slater, G ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Stupka, E ;
Ureta-Vidal, A ;
Vastrik, I ;
Birney, E .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :38-42
[7]  
Ferrier DEK, 2003, INT J DEV BIOL, V47, P605
[8]   Selecting for functional alternative splices in ESTs [J].
Kan, ZY ;
States, D ;
Gish, W .
GENOME RESEARCH, 2002, 12 (12) :1837-1845
[9]  
KORK I, 2001, BIOINFORMATICS S1, V17, pS140
[10]  
Legare ME, 2000, GENOME RES, V10, P42