BIOZON: a system for unification, management and analysis of heterogeneous biological data

被引:65
作者
Birkland, A [1 ]
Yona, G [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
基金
美国国家科学基金会;
关键词
D O I
10.1186/1471-2105-7-70
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. Description: Here we present a system ( Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types ( such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first- of- a- kind biological ranking systems were explored and integrated. Conclusion: The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, " fuzzy" searches, data materialization and more, online at http:// biozon. org.1.
引用
收藏
页数:27
相关论文
共 47 条
  • [1] The InterPro database, an integrated documentation resource for protein families, domains and functional sites
    Apweiler, R
    Attwood, TK
    Bairoch, A
    Bateman, A
    Birney, E
    Biswas, M
    Bucher, P
    Cerutti, T
    Corpet, F
    Croning, MDR
    Durbin, R
    Falquet, L
    Fleischmann, W
    Gouzy, J
    Hermjakob, H
    Hulo, N
    Jonassen, I
    Kahn, D
    Kanapin, A
    Karavidopoulou, Y
    Lopez, R
    Marx, B
    Mulder, NJ
    Oinn, TM
    Pagni, M
    Servant, F
    Sigrist, CJA
    Zdobnov, EM
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 37 - 40
  • [2] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [3] PRINTS prepares for the new millennium
    Attwood, TK
    Flower, DR
    Lewis, AP
    Mabey, JE
    Morgan, SR
    Scordis, P
    Selley, JN
    Wright, W
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 220 - 225
  • [4] BIND - The Biomolecular Interaction Network Database
    Bader, GD
    Donaldson, I
    Wolting, C
    Ouellette, BFF
    Pawson, T
    Hogue, CWV
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 242 - 245
  • [5] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [6] Baker P G, 1998, Proc Int Conf Intell Syst Mol Biol, V6, P25
  • [7] Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins
    Bateman, A
    Birney, E
    Durbin, R
    Eddy, SR
    Finn, RD
    Sonnhammer, ELL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 260 - 262
  • [8] GenBank
    Benson, DA
    Boguski, MS
    Lipman, DJ
    Ostell, J
    Ouellette, BFF
    Rapp, BA
    Wheeler, DL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 12 - 17
  • [9] BIOZON: a hub of heterogeneous biological data
    Birkland, Aaron
    Yona, Golan
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : D235 - D242
  • [10] Universal trees based on large combined protein sequence data sets
    Brown, JR
    Douady, CJ
    Italia, MJ
    Marshall, WE
    Stanhope, MJ
    [J]. NATURE GENETICS, 2001, 28 (03) : 281 - 285