Gene recognition in eukaryotic DNA by comparison of genomic sequences

被引:23
作者
Novichkov, PS [1 ]
Gelfand, MS [1 ]
Mironov, AA [1 ]
机构
[1] State Sci Ctr GosNIIGenet, Moscow 113545, Russia
关键词
D O I
10.1093/bioinformatics/17.11.1011
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Sequencing of complete eukaryotic genomes and large syntenic fragments of genomes makes it possible to apply genomic comparison for gene recognition. Results: This paper describes a spliced alignment algorithm that aligns candidate exon chains of two homologous genomic sequence fragments from different species. The algorithm is implemented in Pro-Gen software. Unlike other algorithms, Pro-Gen does not assume conservation of the exon-intron structure. Amino acid sequences obtained by the formal translation of candidate exons are aligned instead of nucleotide sequences, which allows for distant comparisons. The algorithm was tested on a sample of human-mammal (mouse), human-vertebrate (Xenopus) and human-invertebrate (Drosophila) gene pairs. Surprisingly, the best results, 97-98% correlation between the actual and predicted genes, were obtained for more distant comparisons, whereas the correlation on the human-mouse sample was only 93%. The latter value increases to 95% if conservation of the exon-intron structure is assumed. This is caused by a large amount of sequence conservation in non-coding regions of the human and mouse genes probably due to regulatory elements.
引用
收藏
页码:1011 / 1018
页数:8
相关论文
共 43 条
[31]  
Mott R, 1997, COMPUT APPL BIOSCI, V13, P477
[32]   Positionally cloned human disease genes: Patterns of evolutionary conservation and functional motifs [J].
Mushegian, AR ;
Bassett, DE ;
Boguski, MS ;
Bork, P ;
Koonin, EV .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1997, 94 (11) :5831-5836
[33]   Prediction of the exon-intron structure by comparison of genomic sequences [J].
Novichkov, PS ;
Gelfand, MS ;
Mironov, AA .
MOLECULAR BIOLOGY, 2000, 34 (02) :200-206
[34]  
NOVICHKOV PS, 2000, P 2 INT C BIO GEN RE, V2, P42
[35]   A dictionary-based approach for gene annotation [J].
Pachter, L ;
Batzoglou, S ;
Spitkovsky, VI ;
Banks, E ;
Lander, ES ;
Kleitman, DJ ;
Berger, B .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :419-430
[36]   Genome annotation assessment in Drosophila melanogaster [J].
Reese, MG ;
Hartzell, G ;
Harris, NL ;
Ohler, U ;
Abril, JF ;
Lewis, SE .
GENOME RESEARCH, 2000, 10 (04) :483-501
[37]   Combinatorial approaches to gene recognition [J].
Roytberg, MA ;
Astakhova, TV ;
Gelfand, MS .
COMPUTERS & CHEMISTRY, 1997, 21 (04) :229-235
[38]  
Thacker C, 1999, GENOME RES, V9, P348
[39]   FALSE ASSOCIATION OF HUMAN ESTS [J].
TSAI, JY ;
NAMINGONZALEZ, ML ;
SILVER, LM .
NATURE GENETICS, 1994, 8 (04) :321-322
[40]   Gene structure prediction by spliced alignment of genomic DNA with protein sequences: Increased accuracy by differential splice site scoring [J].
Usuka, J ;
Brendel, V .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 297 (05) :1075-1085