NEURAL NETWORK DETECTS ERRORS IN THE ASSIGNMENT OF MESSENGER-RNA SPLICE SITES

被引:49
作者
BRUNAK, S
ENGELBRECHT, J
KNUDSEN, S
机构
[1] ROYAL VET & AGR UNIV,DEPT DAIRY & FOOD SCI,DK-1870 FREDERIKSBERG C,DENMARK
[2] UNIV COPENHAGEN,INST MICROBIOL,DK-1353 COPENHAGEN C,DENMARK
关键词
D O I
10.1093/nar/18.16.4797
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The use of databanks in genetic research assumes reliability of the information they contain. Currently, error-detection in the manually or electronically entered data contained in the nucleotide sequence databanks at EMBL, Heidelberg and GenBank at Los Alamos is limited. We have used a subset of sequences from these databanks to train neural networks to recognize pre-mRNA splicing signals in human genes. During the training on 33 human genes from the EMBL databank seven genes appeared to disturb the learning process. Subsequent investigation revealed discrepancies from the original published papers, for three genes. In four genes, we found wrongly assigned splicing frames of introns. We believe this to be a reflection of the fact that splicing frames cannot always be unambiguously assigned on the basis of experimental data. Thus incorrect assignment appear both due to mere typographical misprints as well as erroneous interpretation of experiments. Training on 241 human sequences from GenBank revealed nine new errors. We propose that such errors could be detected by computer algorithms designed to check the consistency of data prior to their incorporation in databanks. © 1990 Oxford University Press.
引用
收藏
页码:4797 / 4801
页数:5
相关论文
共 30 条
  • [1] [Anonymous], 1987, LEARNING INTERNAL RE
  • [2] THE PRIMARY STRUCTURE OF THE HUMAN EPSILON-GLOBIN GENE
    BARALLE, FE
    SHOULDERS, CC
    PROUDFOOT, NJ
    [J]. CELL, 1980, 21 (03) : 621 - 626
  • [3] A NOVEL-APPROACH TO PREDICTION OF THE 3-DIMENSIONAL STRUCTURES OF PROTEIN BACKBONES BY NEURAL NETWORKS
    BOHR, H
    BOHR, J
    BRUNAK, S
    COTTERILL, RMJ
    FREDHOLM, H
    LAUTRUP, B
    PETERSEN, SB
    [J]. FEBS LETTERS, 1990, 261 (01) : 43 - 46
  • [4] PROTEIN SECONDARY STRUCTURE AND HOMOLOGY BY NEURAL NETWORKS - THE ALPHA-HELICES IN RHODOPSIN
    BOHR, H
    BOHR, J
    BRUNAK, S
    COTTERILL, RMJ
    LAUTRUP, B
    NORSKOV, L
    OLSEN, OH
    PETERSEN, SB
    [J]. FEBS LETTERS, 1988, 241 (1-2) : 223 - 228
  • [5] CLEANING UP GENE DATABASES
    BRUNAK, S
    ENGELBRECHT, J
    KNUDSEN, S
    [J]. NATURE, 1990, 343 (6254) : 123 - 123
  • [6] CHEN SJ, 1989, ONCOGENE, V4, P195
  • [7] HUMAN GROWTH-HORMONE DNA-SEQUENCE AND MESSENGER-RNA STRUCTURE - POSSIBLE ALTERNATIVE SPLICING
    DENOTO, FM
    MOORE, DD
    GOODMAN, HM
    [J]. NUCLEIC ACIDS RESEARCH, 1981, 9 (15) : 3719 - 3730
  • [8] NEW SUBGROUPS IN THE HUMAN T-CELL REARRANGING V-GAMMA GENE LOCUS
    FORSTER, A
    HUCK, S
    GHANEM, N
    LEFRANC, MP
    RABBITTS, TH
    [J]. EMBO JOURNAL, 1987, 6 (07) : 1945 - 1950
  • [9] PRE-MESSENGER-RNA SPLICING
    GREEN, MR
    [J]. ANNUAL REVIEW OF GENETICS, 1986, 20 : 671 - 708
  • [10] SEQUENCE AND ORGANIZATION OF GENES ENCODING THE HUMAN 27 KDA HEAT-SHOCK PROTEIN
    HICKEY, E
    BRANDON, SE
    POTTER, R
    STEIN, G
    STEIN, J
    WEBER, LA
    [J]. NUCLEIC ACIDS RESEARCH, 1986, 14 (10) : 4127 - 4145