Computational methods for the identification of genes in vertebrate genomic sequences

被引:178
作者
Claverie, JM
机构
[1] Struct. and Genetic Info. Laboratory, CNRS-EP.91, 13402 Marseille cedex 20
关键词
D O I
10.1093/hmg/6.10.1735
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Research into new methods to identify genes in anonymous genomic sequences has been going on for more than 15 years, Over this period of time, the field has evolved from the designing of programs to identify protein coding regions in compact mitochondrial or bacterial genomes, to the challenge of predicting the detailed organization of multi-exon vertebrate genes. The best program currently available perfectly locates more than 80% of the internal coding exons, and only 5% of the predictions do not overlap a real exon, Given such accuracy, computational methods are indeed very useful; however, they do not alleviate the need for experimental validation. If the performances are satisfactory for the identification of the coding moiety of genes (internal coding exons), the determination of the full extent of the transcript (5' and 3' extremities of the gene) and the location of promoter regions are still unreliable, As the human and mouse genome sequencing projects enter a production mode, the fully automated annotation of megabase-long anonymous genomic sequences is the next big challenge in bioinformatics.
引用
收藏
页码:1735 / 1744
页数:10
相关论文
共 98 条
[1]   Toward the development of a gene index to the human genome: An assessment of the nature of high-throughput EST sequence data [J].
Aaronson, JS ;
Eckman, B ;
Blevins, RA ;
Borkowski, JA ;
Myerson, J ;
Imran, S ;
Elliston, KO .
GENOME RESEARCH, 1996, 6 (09) :829-845
[2]  
ADAMS MD, 1995, NATURE, V377, P3
[3]   COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT [J].
ADAMS, MD ;
KELLEY, JM ;
GOCAYNE, JD ;
DUBNICK, M ;
POLYMEROPOULOS, MH ;
XIAO, H ;
MERRIL, CR ;
WU, A ;
OLDE, B ;
MORENO, RF ;
KERLAVAGE, AR ;
MCCOMBIE, WR ;
VENTER, JC .
SCIENCE, 1991, 252 (5013) :1651-1656
[4]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[5]   A gene-rich cluster between the CD4 and triosephosphate isomerase genes at human chromosome 12p13 [J].
AnsariLari, MA ;
Muzny, DM ;
Lu, J ;
Lu, F ;
Lilley, CE ;
Spanos, S ;
Malley, T ;
Gibbs, RA .
GENOME RESEARCH, 1996, 6 (04) :314-326
[6]   Large-scale sequencing in human chromosome 12p13: Experimental and computational gene structure determination [J].
AnsariLari, MA ;
Shen, Y ;
Muzny, DM ;
Lee, W ;
Gibbs, RA .
GENOME RESEARCH, 1997, 7 (03) :268-280
[7]   PERIODICITIES IN INTRONS [J].
ARQUES, DG ;
MICHEL, CJ .
NUCLEIC ACIDS RESEARCH, 1987, 15 (18) :7581-7592
[8]  
BBURLEY SK, 1996, ANNU REV BIOCHEM, V65, P769
[9]   INTERVENING SEQUENCES EXHIBIT DISTINCT VOCABULARY [J].
BECKMANN, JS ;
BRENDEL, V ;
TRIFONOV, EN .
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 1986, 4 (03) :391-400
[10]   GenBank [J].
Benson, Dennis A. ;
Karsch-Mizrachi, Ilene ;
Lipman, David J. ;
Ostell, James ;
Sayers, Eric W. .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D32-D37