THE PREDICTION OF EXONS THROUGH AN ANALYSIS OF SPLICEABLE OPEN READING FRAMES

被引:47
作者
HUTCHINSON, GB [1 ]
HAYDEN, MR [1 ]
机构
[1] UNIV BRITISH COLUMBIA, DEPT MED GENET, VANCOUVER V6T 1W5, BC, CANADA
关键词
D O I
10.1093/nar/20.13.3453
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We have developed a computer program which predicts internal exons from naive genomic sequence data and which will run on any IBM-compatible 80286 (or higher) computer. The algorithm searches a sequence for 'spliceable open reading frames' (SORFs), which are open reading frames bracketed by suitable splice-recognition sequences, and then analyzes the region for codon usage. Potential exons are stratified according to the reliability of their prediction, from confidence levels 1 to 5. The program is designed to predict internal exons of length greater than 60 nucleotides. In an analysis of 116 genes of a training set, 384 out of 441 such exons (87.1%) are identified, with 280 (63.5%) of predictions matching the true exon exactly (at both 5' and 3' splice junctions and in the correct reading frame), and with 104 (23.6%) exons matching partially. In a similar analysis of 14 genes in a test set unrelated to the genes used to generate the parameters of the program, 70 out of 80 internal exons greater than 60 bp in length are identified (87.5%), with 47 completely and 23 partially matched. SORFs that partially match true internal exons share at least one splice junction with the exon, or share both splice junctions but are interpreted in an incorrect reading frame. Specificity (the percentage of SORFs that correspond to true exons) varies from 91% at confidence level 1 to 16% at confidence level 5, with an overall specificity of 35 - 40%. The output displays nucleotide position, confidence level, reading frame phase at the 5' and 3' ends, acceptor and donor sequences and scoring statistics and also gives an amino acid translation of the potential exon. SORFIND compares favourably with other programs currently used to predict protein-coding regions.
引用
收藏
页码:3453 / 3462
页数:10
相关论文
共 18 条
[1]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS .2. THE BINDING-SPECIFICITY OF CYCLIC-AMP RECEPTOR PROTEIN TO RECOGNITION SITES [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 200 (04) :709-723
[2]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[3]   PREDICTION OF HUMAN MESSENGER-RNA DONOR AND ACCEPTOR SITES FROM THE DNA-SEQUENCE [J].
BRUNAK, S ;
ENGELBRECHT, J ;
KNUDSEN, S .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 220 (01) :49-65
[4]  
CLAVERIE JM, 1990, METHOD ENZYMOL, V183, P237
[5]   RECOGNITION OF PROTEIN CODING REGIONS IN DNA-SEQUENCES [J].
FICKETT, JW .
NUCLEIC ACIDS RESEARCH, 1982, 10 (17) :5303-5318
[6]  
FIELDS CA, 1990, COMPUT APPL BIOSCI, V6, P263
[7]   COMPUTER-PREDICTION OF THE EXON-INTRON STRUCTURE OF MAMMALIAN PRE-MESSENGER-RNAS [J].
GELFAND, MS .
NUCLEIC ACIDS RESEARCH, 1990, 18 (19) :5865-5869
[8]  
GUIGO R, 1992, IN PRESS J MOL BIOL
[9]   COMPLEXITY CHARTS CAN BE USED TO MAP FUNCTIONAL DOMAINS IN DNA [J].
KONOPKA, AK ;
OWENS, J .
GENETIC ANALYSIS-BIOMOLECULAR ENGINEERING, 1990, 7 (02) :35-38
[10]  
LAPEDES A, 1988, COMPUTERS DNA, P157