Optimal spliced alignment of homologous cDNA to a genomic DNA template

被引:97
作者
Usuka, J
Zhu, W
Brendel, V
机构
[1] Iowa State Univ, Dept Zool & Genet, Ames, IA 50011 USA
[2] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/16.3.203
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Supplementary cDNA or EST evidence is often decisive for discriminating between alternative gene predictions derived from computational sequence inspection by any of a number of requisite programs. Without additional experimental effort, this approach must rely on the occurrence of cognate ESTs for the gene under consideration in available, generally incomplete, EST collections for the given species. In some cases, particular exon assignments can be supported by sequence matching even if the cDNA or EST is produced from non-cognate genomic DNA, including different loci of a gene family or homologous loci from different species. However; marginally significant sequence matching alone can also be misleading. We sought to develop an algorithm that would simultaneously score for predicted intrinsic splice site strength and sequence matching between the genomic DNA template and a related cDNA or EST: In this case, weakly predicted splice sites may be chosen for the optimal scoring spliced alignment on the basis of surrounding sequence matching. Strongly predicted splice sites will enter the optimal spliced alignment even without strong sequence matching. Results: We designed a novel algorithm that produces the optimal spliced alignment of a genomic DNA with a cDNA or EST based on scoring for both sequence matching and intrinsic splice site strength. By example, we demonstrate that this combined approach appears to improve gene prediction accuracy compared with current methods that rely, only on either search by content and signal or on sequence similarity.
引用
收藏
页码:203 / 211
页数:9
相关论文
共 19 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   PairWise and SearchWise: Finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames [J].
Birney, E ;
Thompson, JD ;
Gibson, TJ .
NUCLEIC ACIDS RESEARCH, 1996, 24 (14) :2730-2739
[3]  
Brendel V, 1998, LOOK BEYOND TRANSCRIPTION, P20
[4]   Prediction of locally optimal splice sites in plant pre-mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA [J].
Brendel, V ;
Kleffe, J .
NUCLEIC ACIDS RESEARCH, 1998, 26 (20) :4748-4757
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[6]   Computational methods for the identification of genes in vertebrate genomic sequences [J].
Claverie, JM .
HUMAN MOLECULAR GENETICS, 1997, 6 (10) :1735-1744
[7]   A computer program for aligning a cDNA sequence with a genomic DNA sequence [J].
Florea, L ;
Hartzell, G ;
Zhang, Z ;
Rubin, GM ;
Miller, W .
GENOME RESEARCH, 1998, 8 (09) :967-974
[8]   Gene recognition via spliced sequence alignment [J].
Gelfand, MS ;
Mironov, AA ;
Pevzner, PA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (17) :9061-9066
[9]   AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES [J].
GOTOH, O .
JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) :705-708
[10]   Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information [J].
Hebsgaard, SM ;
Korning, PG ;
Tolstrup, N ;
Engelbrecht, J ;
Rouze, P ;
Brunak, S .
NUCLEIC ACIDS RESEARCH, 1996, 24 (17) :3439-3452