Performance-guarantee gene predictions via spliced alignment

被引:30
作者
Mironov, AA
Roytberg, MA
Pevzner, PA [1 ]
Gelfand, MS
机构
[1] Univ So Calif, Dept Math, Los Angeles, CA 90089 USA
[2] Univ So Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
[3] NIIGENETIKA, Natl Biotechnol Ctr, Lab Math Methods, Moscow 113545, Russia
[4] Russian Acad Sci, Inst Prot Res, Pushchino 142292, Moscow Region, Russia
[5] Russian Acad Sci, Inst Math Problems Biol, Pushchino 142292, Moscow Region, Russia
基金
俄罗斯基础研究基金会;
关键词
D O I
10.1006/geno.1998.5251
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
An important and still unsolved problem in gene prediction is designing an algorithm that not only predicts genes but estimates the quality of individual predictions as well. Since experimental biologists are interested mainly in the reliability of individual predictions (rather than in the average reliability of an algorithm) we attempted to develop a gene recognition algorithm that guarantees a certain quality of predictions. We demonstrate here that the similarity level with a related protein is a reliable quality estimator for the spliced alignment approach to gene recognition. We also study the average performance of the spliced alignment algorithm for different targets on a complete set of human genomic sequences with known relatives and demonstrate that the average performance of the method remains high even for very distant targets. Using plant, fungal, and prokaryotic target proteins for recognition of human genes leads to accurate predictions with 95, 93, and 91% correlation coefficient, respectively. For target proteins with similarity score above 60%, not only the average correlation coefficient is very high (97% and up) but also the quality of individual predictions is guaranteed to be at least 82%. It indicates that for this level of similarity the worst case performance of the spliced alignment algorithm is better than the average case performance of many statistical gene recognition methods. (C) 1998 Academic Press
引用
收藏
页码:332 / 339
页数:8
相关论文
共 19 条
[1]   AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[2]   GenBank [J].
Benson, Dennis A. ;
Karsch-Mizrachi, Ilene ;
Lipman, David J. ;
Ostell, James ;
Sayers, Eric W. .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D32-D37
[3]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[4]   The gene identification problem: An overview for developers [J].
Fickett, JW .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :103-118
[5]   Finding genes by computer: The state of the art [J].
Fickett, JW .
TRENDS IN GENETICS, 1996, 12 (08) :316-320
[6]  
Gelfand M S, 1995, J Comput Biol, V2, P87, DOI 10.1089/cmb.1995.2.87
[7]   Recognition of genes in human DNA sequences [J].
Gelfand, MS ;
Podolsky, LI ;
Astakhova, TV ;
Roytberg, MA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1996, 3 (02) :223-234
[8]   Gene recognition via spliced sequence alignment [J].
Gelfand, MS ;
Mironov, AA ;
Pevzner, PA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (17) :9061-9066
[9]   IDENTIFICATION OF PROTEIN CODING REGIONS BY DATABASE SIMILARITY SEARCH [J].
GISH, W ;
STATES, DJ .
NATURE GENETICS, 1993, 3 (03) :266-272
[10]   SIMILARITY LANDSCAPES - A WAY TO DETECT MANY STRUCTURAL AND SEQUENCE MOTIFS IN BOTH INTRONS AND EXONS [J].
HULTNER, M ;
SMITH, DW ;
WILLS, C .
JOURNAL OF MOLECULAR EVOLUTION, 1994, 38 (02) :188-203