PREDICTING INTERNAL EXONS BY OLIGONUCLEOTIDE COMPOSITION AND DISCRIMINANT-ANALYSIS OF SPLICEABLE OPEN READING FRAMES

被引:251
作者
SOLOVYEV, VV
SALAMOV, AA
LAWRENCE, CB
机构
[1] Department of Cell Biology, Baylor College of Medicine, Houston, TX 77030, One Baylor Plaza
关键词
D O I
10.1093/nar/22.24.5156
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A new method which predicts internal exon sequences in human DNA has been developed. The method is based on a splice site prediction algorithm that uses the linear discriminant function to combine information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotides in protein coding and intron regions. The accuracy of our splice site recognition function is 97% for donor splice sites and 96% for acceptor splice sites. For exon prediction, we combine in a discriminant function the characteristics describing the 5'-intron region, donor splice site, coding region, acceptor splice site and 3'-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79%. The recognition quality computed at the level of individual nucleotides is 89% for exon sequences and 98% for intron sequences. This corresponds to a correlation coefficient for exon prediction of 0.87. The precision of this approach is better than other methods and has been tested on a larger data set. We have also developed a means for predicting exon - exon junctions in cDNA sequences, which can be useful for selecting optimal PCR primers.
引用
收藏
页码:5156 / 5163
页数:8
相关论文
共 28 条
  • [1] AFIFI AA, 1979, STATISTICAL ANAL COM
  • [2] PREDICTION OF HUMAN MESSENGER-RNA DONOR AND ACCEPTOR SITES FROM THE DNA-SEQUENCE
    BRUNAK, S
    ENGELBRECHT, J
    KNUDSEN, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1991, 220 (01) : 49 - 65
  • [3] ELECTRONIC DATA PUBLISHING AND GENBANK
    CINKOSKY, MJ
    FICKETT, JW
    GILNA, P
    BURKS, C
    [J]. SCIENCE, 1991, 252 (5010) : 1273 - 1277
  • [4] DETERMINATION OF EUKARYOTIC PROTEIN CODING REGIONS USING NEURAL NETWORKS AND INFORMATION-THEORY
    FARBER, R
    LAPEDES, A
    SIROTKIN, K
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1992, 226 (02) : 471 - 479
  • [5] ASSESSMENT OF PROTEIN CODING MEASURES
    FICKETT, JW
    TUNG, CS
    [J]. NUCLEIC ACIDS RESEARCH, 1992, 20 (24) : 6441 - 6450
  • [6] FIELDS CA, 1990, COMPUT APPL BIOSCI, V6, P263
  • [7] PREDICTION OF GENE STRUCTURE
    GUIGO, R
    KNUDSEN, S
    DRAKE, N
    SMITH, T
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1992, 226 (01) : 141 - 157
  • [8] THE PREDICTION OF EXONS THROUGH AN ANALYSIS OF SPLICEABLE OPEN READING FRAMES
    HUTCHINSON, GB
    HAYDEN, MR
    [J]. NUCLEIC ACIDS RESEARCH, 1992, 20 (13) : 3453 - 3462
  • [9] ASSIGNMENT OF POSITION-SPECIFIC ERROR-PROBABILITY TO PRIMARY DNA-SEQUENCE DATA
    LAWRENCE, CB
    SOLOVYEV, VV
    [J]. NUCLEIC ACIDS RESEARCH, 1994, 22 (07) : 1272 - 1280
  • [10] COMPARISON OF PREDICTED AND OBSERVED SECONDARY STRUCTURE OF T4 PHAGE LYSOZYME
    MATTHEWS, BW
    [J]. BIOCHIMICA ET BIOPHYSICA ACTA, 1975, 405 (02) : 442 - 451