Evaluation of human-readable annotation in biomolecular sequence databases with biological rule libraries

被引:33
作者
Eisenhaber, F
Bork, P
机构
[1] European Mol Biol Lab, D-69012 Heidelberg, Germany
[2] Max Delbruck Ctr Mol Med, D-13122 Berlin, Germany
关键词
D O I
10.1093/bioinformatics/15.7.528
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: computer-based selection of entries from sequence databases with respect to a related functional description, e.g. with respect to a common cellular localization or contributing to the same phenotypic function, is a difficult task. Automatic semantic analysis of annotations is not only hampered by incomplete functional assignments. A major problem is that annotations are written in a rich, non-formalized language and are meant for reading by a human expert. This person can extract form the text considerably more information than is immediately apparent due to his extended biological background knowledge and logical reasoning. logical reasoning. Approach: A technique of automated annotation evaluation based on a combination of lexical analysis and the usage of biological rule libraries has been developed. The proposed algorithm generates new functional descriptors from the annotation of a given entry using the semantic units of the annotation as prepositions for implications executed in accordance with the rule library. Results: the prototype of a software system, the Meta_A(annotator) program, is described and the results of its application to sequence attribute assignment and sequence domain annotation of SWISS-PROT entries, are presented. The current software version assigns useful subcellular localization qualifiers to similar to 88% of all SWISS-PROT entries. As shown by demonstrative examples, the combination of sequence and annotation analysis is a powerful approach for the detection of mutual annotation/sequence inconsistencies. Availability: The software is available form Frank.Eisenhaber@embl-heidelberg.de. Results of the cellular localization assignment can be viewed at the ULR http://www.bork.embl-heidelberg.de/CELL_LOC/CELL_LOC.html. Contact: Frank.Eisengaber@EMBL-Geidelberg.DE.
引用
收藏
页码:528 / 535
页数:8
相关论文
共 28 条
  • [1] Virgil: a database of rich links between GDB and GenBank
    Achard, F
    Barillot, E
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 100 - 101
  • [2] GenXref VI: automatic generation of links between two heterogeneous databases
    Achard, F
    Dessen, P
    [J]. BIOINFORMATICS, 1998, 14 (01) : 20 - 24
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families
    Andrade, MA
    Valencia, A
    [J]. BIOINFORMATICS, 1998, 14 (07) : 600 - 607
  • [5] ANDRADE MA, 1997, ISMB, V5, P25
  • [6] The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 38 - 42
  • [7] The PROSITE database, its status in 1997
    Bairoch, A
    Bucher, P
    Hofmann, K
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (01) : 217 - 221
  • [8] Predicting function: From genes to genomes and back
    Bork, P
    Dandekar, T
    Diaz-Lazcoz, Y
    Eisenhaber, F
    Huynen, M
    Yuan, YP
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1998, 283 (04) : 707 - 725
  • [9] A superfamily of conserved domains in DNA damage responsive cell cycle checkpoint proteins
    Bork, P
    Hofmann, K
    Bucher, P
    Neuwald, AF
    Altschul, SF
    Koonin, EV
    [J]. FASEB JOURNAL, 1997, 11 (01) : 68 - 76
  • [10] THE MODULAR ARCHITECTURE OF A NEW FAMILY OF GROWTH-REGULATORS RELATED TO CONNECTIVE-TISSUE GROWTH-FACTOR
    BORK, P
    [J]. FEBS LETTERS, 1993, 327 (02) : 125 - 130