Improved splice site detection in Genie

被引:1456
作者
Reese, MG [1 ]
Eeckman, FH [1 ]
Kulp, D [1 ]
Haussler, D [1 ]
机构
[1] UNIV CALIF SANTA CRUZ,BASKIN CTR COMP ENGN & COMP SCI,SANTA CRUZ,CA 95064
关键词
D O I
10.1089/cmb.1997.4.311
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present an improved splice site predictor for the genefinding program Genie, Genie is based on a generalized Hidden Markov Model (GHMM) that describes the grammar of a legal parse of a multi-exon gene in a DNA sequence, In Genie, probabilities are estimated for gene features by using dynamic programming to combine information from multiple content and signal sensors, including sensors that integrate matches to homologous sequences from a database. One of the hardest problems in genefinding is to determine the complete gene structure correctly, The splice site sensors are the key signal sensors that address this problem, We replaced the existing splice site sensors in Genie with two novel neural networks based on dinucleotide frequencies, Using these novel sensors, Genie shows significant improvements in the sensitivity and specificity of gene structure identification, Experimental results in tests using a standard set of annotated genes showed that Genie identified 86% of coding nucleotides correctly with a specificity of 85%, versus 80% and 84% in the older system, In further splice site experiments, we also looked at correlations between splice site scores and intron and exon lengths, as well as at the effect of distance to the nearest splice site on false positive rates.
引用
收藏
页码:311 / 323
页数:13
相关论文
共 28 条
[1]  
AUGER IE, 1989, B MATH BIOL, V51, P39, DOI 10.1007/BF02458835
[2]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[3]   PREDICTION OF HUMAN MESSENGER-RNA DONOR AND ACCEPTOR SITES FROM THE DNA-SEQUENCE [J].
BRUNAK, S ;
ENGELBRECHT, J ;
KNUDSEN, S .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 220 (01) :49-65
[4]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[5]  
DONG S, 1994, GENOMICS, V162, P705
[6]   ASSESSMENT OF PROTEIN CODING MEASURES [J].
FICKETT, JW ;
TUNG, CS .
NUCLEIC ACIDS RESEARCH, 1992, 20 (24) :6441-6450
[7]  
FINK GR, 1987, CELL, V49, P355
[8]   PREDICTION OF THE EXON-INTRON STRUCTURE BY A DYNAMIC-PROGRAMMING APPROACH [J].
GELFAND, MS ;
ROYTBERG, MA .
BIOSYSTEMS, 1993, 30 (1-3) :173-182
[9]   PREDICTION OF GENE STRUCTURE [J].
GUIGO, R ;
KNUDSEN, S ;
DRAKE, N ;
SMITH, T .
JOURNAL OF MOLECULAR BIOLOGY, 1992, 226 (01) :141-157
[10]  
HENDERSON J, 1996, P 4 INT C INT SYST M