PREDICTION OF GENE STRUCTURE

被引:234
作者
GUIGO, R
KNUDSEN, S
DRAKE, N
SMITH, T
机构
[1] HARVARD UNIV,SCH MED,DANA FARBER CANC INST,MOLEC BIOL COMP RES RESOURCE,BOSTON,MA 02115
[2] HARVARD UNIV,SCH PUBL HLTH,BOSTON,MA 02115
关键词
GENE IDENTIFICATION; EXON STRUCTURE; INTRON SPLICING; CODING SEQUENCE; ARTIFICIAL INTELLIGENCE;
D O I
10.1016/0022-2836(92)90130-C
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We have developed a hierarchical rule base system for identifying genes in DNA sequences. Atomic sites (such as initiation codons, stop codons, acceptor sites and donor sites) are identified by a number of different methods and evaluated by a set of filters and rules chosen to maximize sensitivity; these are combined into higher-order gene elements (such as exons), evaluated, filtered and combined as equivalence classes into probable genes, which are evaluated and ranked. The system has been tested on an extensive collection of vertebrate genes smaller than 15,000 bases. Results obtained show that, on average, 88% of the predicted coding region for a transcription unit is actually coding, and 80% of the actual coding is correctly predicted. This will, in most applications, be sufficient for a search against protein sequence databases for the identification of probable gene function. In addition, the system provides a general test platform for both gene atomic site identification and the rules for their evaluation and assembly. © 1992.
引用
收藏
页码:141 / 157
页数:17
相关论文
共 21 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] PREDICTION OF HUMAN MESSENGER-RNA DONOR AND ACCEPTOR SITES FROM THE DNA-SEQUENCE
    BRUNAK, S
    ENGELBRECHT, J
    KNUDSEN, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1991, 220 (01) : 49 - 65
  • [3] WEIGHT MATRIX DESCRIPTIONS OF 4 EUKARYOTIC RNA POLYMERASE-II PROMOTER ELEMENTS DERIVED FROM 502 UNRELATED PROMOTER SEQUENCES
    BUCHER, P
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (04) : 563 - 578
  • [4] GENBANK
    BURKS, C
    CASSIDY, M
    CINKOSKY, MJ
    CUMELLA, KE
    GILNA, P
    HAYDEN, JED
    KEEN, GM
    KELLEY, TA
    KELLY, M
    KRISTOFFERSON, D
    RYALS, J
    [J]. NUCLEIC ACIDS RESEARCH, 1991, 19 : 2221 - 2225
  • [5] CRAMER H, 1946, MATH METHODS STATIST
  • [6] EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH
    FELSENSTEIN, J
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) : 368 - 376
  • [7] FICHANT G, 1987, COMPUT APPL BIOSCI, V3, P287
  • [8] RECOGNITION OF PROTEIN CODING REGIONS IN DNA-SEQUENCES
    FICKETT, JW
    [J]. NUCLEIC ACIDS RESEARCH, 1982, 10 (17) : 5303 - 5318
  • [9] FIELDS CA, 1990, COMPUT APPL BIOSCI, V6, P263
  • [10] COMPUTER-PREDICTION OF THE EXON-INTRON STRUCTURE OF MAMMALIAN PRE-MESSENGER-RNAS
    GELFAND, MS
    [J]. NUCLEIC ACIDS RESEARCH, 1990, 18 (19) : 5865 - 5869