Using native and syntenically mapped cDNA alignments to improve de novo gene finding

被引:1469
作者
Stanke, Mario [1 ]
Diekhans, Mark [1 ]
Baertsch, Robert [1 ]
Haussler, David [1 ]
机构
[1] Univ Calif Santa Cruz, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
关键词
D O I
10.1093/bioinformatics/btn013
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Computational annotation of protein coding genes in genomic DNA is a widely used and essential tool for analyzing newly sequenced genomes. However, current methods suffer from inaccuracy and do poorly with certain types of genes. Including additional sources of evidence of the existence and structure of genes can improve the quality of gene predictions. For many eukaryotic genomes, expressed sequence tags (ESTs) are available as evidence for genes. Related genomes that have been sequenced, annotated, and aligned to the target genome provide evidence of existence and structure of genes. Results: We incorporate several different evidence sources into the gene finder AUGUSTUS. The sources of evidence are gene and transcript annotations from related species syntenically mapped to the target genome using TransMap, evolutionary conservation of DNA, mRNA and ESTs of the target species, and retroposed genes. The predictions include alternative splice variants where evidence supports it. Using only ESTs we were able to correctly predict at least one splice form exactly correct in 57 of human genes. Also using evidence from other species and human mRNAs, this number rises to 77. Syntenic mapping is well-suited to annotate genomes closely related to genomes that are already annotated or for which extensive transcript evidence is available. Native cDNA evidence is most helpful when the alignments are used as compound information rather than independent positionwise information.
引用
收藏
页码:637 / 644
页数:8
相关论文
共 33 条
[1]   A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons [J].
Allen, Jonathan E. ;
Salzberg, Steven L. .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2006, 1 (1)
[2]   ExonHunter:: a comprehensive approach to gene finding [J].
Brejová, B ;
Brown, DG ;
Li, M ;
Vinar, T .
BIOINFORMATICS, 2005, 21 :I57-I65
[3]  
BREJOVA B, 2005, THESIS U WATERLOO CA
[4]   HMM sampling and applications to gene finding and alternative splicing [J].
Cawley, Simon L. ;
Pachter, Lior .
BIOINFORMATICS, 2003, 19 :II36-II41
[5]   The Ensembl automatic gene annotation system [J].
Curwen, V ;
Eyras, E ;
Andrews, TD ;
Clarke, L ;
Mongin, E ;
Searle, SMJ ;
Clamp, M .
GENOME RESEARCH, 2004, 14 (05) :942-950
[6]  
DJEBALI S, 2006, BMC GENOME BIOL S1, V7, P1
[7]   Gene and alternative splicing annotation with AIR [J].
Florea, L ;
Di Francesco, V ;
Miller, J ;
Turner, R ;
Yao, A ;
Harris, M ;
Walenz, B ;
Mobarry, C ;
Merkulov, GV ;
Charlab, R ;
Dew, I ;
Deng, ZM ;
Istrail, S ;
Li, P ;
Sutton, G .
GENOME RESEARCH, 2005, 15 (01) :54-66
[8]   Integrating alternative splicing detection into gene prediction [J].
Foissac, S ;
Schiex, T .
BMC BIOINFORMATICS, 2005, 6 (1)
[9]  
GROSS SS, 2005, P RECOMB05, P374
[10]  
GUIGO R, 2006, BMC GENOME BIOL S1, V7, P1