Protein-coding regions prediction combining similarity searches and conservative evolutionary properties of protein-coding sequences

被引:14
作者
Rogozin, IB [1 ]
D'Angelo, D [1 ]
Milanesi, L [1 ]
机构
[1] CNR, Ist Tecnol Biomed Avanzate, I-20090 Milan, Italy
关键词
functional signal; gene detection; local alignment; mismatch; synonymous codon;
D O I
10.1016/S0378-1119(98)00509-5
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The gene identification procedure in a completely new gene with no good homology with protein sequences can be a very complex task. In order to identify the protein-coding region, a new method, 'SYNCOD', based on the analysis of conservative evolutionary properties of coding regions, has been realized. This program is able to identify and use the coding region homologies of the non-annotated (unknown) protein-coding sequences already present in the nucleotide sequence databases by using the alignment produced by BLASTN. The ratio of number mismatches resulting in synonymous codons to the number of mismatches resulting in non-synonymous codons is estimated for each open reading frame. Monte Carlo simulations are then used to estimate the significance of the ratio deviation from random behavior. The SYNCOD program has been tested on generated random sequences and on different control sets. The high accuracy of predicting protein-coding regions (the correlation coefficient, CC, varies from 0.67 to 0.79) and the high specificity (the portion of wrong exons, WE, varies from 0.06 to 0.07) have proved to be important features of the suggested approach. The SYNCOD program is resident on the ITBA-CNR Web Server and can be used via the Internet (URL: www.itba.mi.cnr.it/webgene). (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:129 / 137
页数:9
相关论文
共 34 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], P 2 INT C BIOINF SUP
[3]   THE ISOCHORE ORGANIZATION OF THE HUMAN GENOME [J].
BERNARDI, G .
ANNUAL REVIEW OF GENETICS, 1989, 23 :637-661
[4]   NEW GENES IN OLD SEQUENCE - A STRATEGY FOR FINDING GENES IN THE BACTERIAL GENOME [J].
BORODOVSKY, M ;
KOONIN, EV ;
RUDD, KE .
TRENDS IN BIOCHEMICAL SCIENCES, 1994, 19 (08) :309-313
[5]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[6]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[7]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[8]   GENE STRUCTURE PREDICTION BY LINGUISTIC METHODS [J].
DONG, S ;
SEARLS, DB .
GENOMICS, 1994, 23 (03) :540-551
[9]   The yeast genome project: What did we learn? [J].
Dujon, B .
TRENDS IN GENETICS, 1996, 12 (07) :263-270
[10]   Finding genes by computer: The state of the art [J].
Fickett, JW .
TRENDS IN GENETICS, 1996, 12 (08) :316-320