A dictionary-based approach for gene annotation

被引:15
作者
Pachter, L
Batzoglou, S
Spitkovsky, VI
Banks, E
Lander, ES
Kleitman, DJ
Berger, B [1 ]
机构
[1] MIT, Dept Math, Cambridge, MA 02139 USA
[2] MIT, Comp Sci Lab, Cambridge, MA 02139 USA
[3] MIT, Whitehead Inst, Cambridge, MA 02139 USA
[4] MIT, Dept Biol, Cambridge, MA 02139 USA
关键词
gene recognition; exon prediction; splice site detection; alternative splicing;
D O I
10.1089/106652799318364
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein owl database and the other from the dbEST database. These dictionaries are used to obtain O(1) time lookups of tuples in the dictionaries (4 tuples for the Owl database and 11 tuples for the dbEST database). These tuples can be used to rapidly find the longest matches at every position in an input sequence to the database sequences. Such matches provide very useful information pertaining to locating common segments between exons, alternative splice sites, and frequency data of long tuples for statistical purposes, These dictionaries also provide the basis for both homology determination, and statistical approaches to exon prediction.
引用
收藏
页码:419 / 430
页数:12
相关论文
共 20 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
BATZOGLOU S, 1998, DOC MATH EXTRA, V1, P649
[4]   Sequence alignment with tandem duplication [J].
Benson, G .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (03) :351-367
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[6]   Gene recognition via spliced sequence alignment [J].
Gelfand, MS ;
Mironov, AA ;
Pevzner, PA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (17) :9061-9066
[7]   Computational gene identification: an open problem [J].
Guigo, R .
COMPUTERS & CHEMISTRY, 1997, 21 (04) :215-222
[8]   Finding genes in DNA with a Hidden Markov Model [J].
Henderson, J ;
Salzberg, S ;
Fasman, KH .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (02) :127-141
[9]   A tool for analyzing and annotating genomic sequences [J].
Huang, XQ ;
Adams, MD ;
Zhou, H ;
Kerlavage, AR .
GENOMICS, 1997, 46 (01) :37-45