Las Vegas algorithms for gene recognition: Suboptimal and error-tolerant spliced alignment

被引:23
作者
Sze, SH [1 ]
Pevzner, PA [1 ]
机构
[1] UNIV SO CALIF,DEPT MATH,LOS ANGELES,CA 90089
关键词
D O I
10.1089/cmb.1997.4.297
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Recently, Gelfand, Mironov and Pevzner (1996) proposed a spliced alignment approach to gene recognition that provides 99% accurate recognition of human genes if a related mammalian protein is available, However, even 99% accurate gene predictions are insufficient for automated sequence annotation in large-scale sequencing projects and therefore have to be complemented by experimental gene verification, One hundred percent accurate gene predictions would lead to a substantial reduction of experimental work on gene identification, Our goal is to develop an algorithm that either predicts an exon assembly with accuracy sufficient for sequence annotation or warns a biologist that the accuracy of a prediction is insufficient and further experimental work is required, We study suboptimal and error-tolerant spliced alignment problems as the first steps towards such an algorithm, and report an algorithm which provides 100% accurate recognition of human genes in 37% of cases (if a related mammalian protein is available), In 52% of genes, the algorithm predicts at least one exon with 100% accuracy.
引用
收藏
页码:297 / 309
页数:13
相关论文
共 30 条
[21]   IDENTIFICATION OF PROTEIN-CODING REGIONS IN GENOMIC DNA [J].
SNYDER, EE ;
STORMO, GD .
JOURNAL OF MOLECULAR BIOLOGY, 1995, 248 (01) :1-18
[22]   PREDICTING INTERNAL EXONS BY OLIGONUCLEOTIDE COMPOSITION AND DISCRIMINANT-ANALYSIS OF SPLICEABLE OPEN READING FRAMES [J].
SOLOVYEV, VV ;
SALAMOV, AA ;
LAWRENCE, CB .
NUCLEIC ACIDS RESEARCH, 1994, 22 (24) :5156-5163
[23]   MOLECULAR SEQUENCE ACCURACY AND THE ANALYSIS OF PROTEIN CODING REGIONS [J].
STATES, DJ ;
BOTSTEIN, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1991, 88 (13) :5518-5522
[24]  
Stormo G.D., 1994, P 2 INT C INT SYST M, P369
[25]   LOCATING PROTEIN-CODING REGIONS IN HUMAN DNA-SEQUENCES BY A MULTIPLE SENSOR NEURAL NETWORK APPROACH [J].
UBERBACHER, EC ;
MURAL, RJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1991, 88 (24) :11261-11265
[26]   SEQUENCE ALIGNMENT AND PENALTY CHOICE - REVIEW OF CONCEPTS, CASE-STUDIES AND IMPLICATIONS [J].
VINGRON, M ;
WATERMAN, MS .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 235 (01) :1-12
[27]  
Waterman M., 1995, INTRO COMPUTATIONAL
[28]   SEQUENCE ALIGNMENTS IN THE NEIGHBORHOOD OF THE OPTIMUM WITH GENERAL APPLICATION TO DYNAMIC-PROGRAMMING [J].
WATERMAN, MS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-PHYSICAL SCIENCES, 1983, 80 (10) :3123-3124
[29]  
XU Y, 1995, COMPUT APPL BIOSCI, V11, P117
[30]  
Xu Y, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P376