A Chado case study: an ontology-based modular schema for representing genome-associated biological information

被引:190
作者
Mungall, Christopher J.
Emmert, David B.
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Lab, Berkeley, CA 94720 USA
[2] Harvard Univ, Cambridge, MA 02138 USA
基金
英国医学研究理事会;
关键词
D O I
10.1093/bioinformatics/btm189
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A few years ago, FlyBase undertook to design a new database schema to store Drosophila data. It would fully integrate genomic sequence and annotation data with bibliographic, genetic, phenotypic and molecular data from the literature representing a distillation of the first 100 years of research on this major animal model system. In developing this new integrated schema, FlyBase also made a commitment to ensure that its design was generic, extensible and available as open source, so that it could be employed as the core schema of any model organism data repository, thereby avoiding redundant software development and potentially increasing interoperability. Our question was whether we could create a relational database schema that would be successfully reused. Results: Chado is a relational database schema now being used to manage biological knowledge for a wide variety of organisms, from human to pathogens, especially the classes of information that directly or indirectly can be associated with genome sequences or the primary RNA and protein products encoded by a genome. Biological databases that conform to this schema can interoperate with one another, and with application software from the Generic Model Organism Database (GMOD) toolkit. Chado is distinctive because its design is driven by ontologies. The use of ontologies ( or controlled vocabularies) is ubiquitous across the schema, as they are used as a means of typing entities. The Chado schema is partitioned into integrated subschemas ( modules), each encapsulating a different biological domain, and each described using representations in appropriate ontologies. To illustrate this methodology, we describe here the Chado modules used for describing genomic sequences.
引用
收藏
页码:I337 / I346
页数:10
相关论文
共 21 条
  • [1] ParameciumDB:: a community resource that integrates the Paramecium tetraurelia genome sequence with genetic data
    Arnaiz, Olivier
    Cain, Scott
    Cohen, Jean
    Sperling, Linda
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D439 - D444
  • [2] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [3] An ontology for cell types
    Bard, J
    Rhee, SY
    Ashburner, M
    [J]. GENOME BIOLOGY, 2005, 6 (02)
  • [4] Genetic control of biochemical reactions in neurospora
    Beadle, GW
    Tatum, EL
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1941, 27 : 499 - 506
  • [5] Globally distributed object identification for biological knowledgebases
    Clark, T
    Martin, S
    Liefeld, T
    [J]. BRIEFINGS IN BIOINFORMATICS, 2004, 5 (01) : 59 - 70
  • [6] DURBIN R, 1994, COMPUTATIONAL METHOD
  • [7] Sequence Ontology Annotation Guide
    Eilbeck, K
    Lewes, SE
    [J]. COMPARATIVE AND FUNCTIONAL GENOMICS, 2004, 5 (08): : 642 - 647
  • [8] The Sequence Ontology: a tool for the unification of genome annotations
    Eilbeck, K
    Lewis, SE
    Mungall, CJ
    Yandell, M
    Stein, L
    Durbin, R
    Ashburner, M
    [J]. GENOME BIOLOGY, 2005, 6 (05)
  • [9] The Gene Ontology (GO) database and informatics resource
    Harris, MA
    Clark, J
    Ireland, A
    Lomax, J
    Ashburner, M
    Foulger, R
    Eilbeck, K
    Lewis, S
    Marshall, B
    Mungall, C
    Richter, J
    Rubin, GM
    Blake, JA
    Bult, C
    Dolan, M
    Drabkin, H
    Eppig, JT
    Hill, DP
    Ni, L
    Ringwald, M
    Balakrishnan, R
    Cherry, JM
    Christie, KR
    Costanzo, MC
    Dwight, SS
    Engel, S
    Fisk, DG
    Hirschman, JE
    Hong, EL
    Nash, RS
    Sethuraman, A
    Theesfeld, CL
    Botstein, D
    Dolinski, K
    Feierbach, B
    Berardini, T
    Mundodi, S
    Rhee, SY
    Apweiler, R
    Barrell, D
    Camon, E
    Dimmer, E
    Lee, V
    Chisholm, R
    Gaudet, P
    Kibbe, W
    Kishore, R
    Schwarz, EM
    Sternberg, P
    Gwinn, M
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D258 - D261
  • [10] HOSKINS RA, 2002, GENOME BIOL, V0003