BIOZON: a hub of heterogeneous biological data

被引:23
作者
Birkland, Aaron [1 ]
Yona, Golan [1 ]
机构
[1] Cornell Univ, Ithaca, NY USA
关键词
D O I
10.1093/nar/gkj153
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Biological entities are strongly related and mutually dependent on each other. Therefore, there is a growing need to corroborate and integrate data from different resources and aspects of biological systems in order to analyze them effectively. Biozon is a unified biological database that integrates heterogeneous data types such as proteins, structures, domain families, protein-protein interactions and cellular pathways, and establishes the relationships between them. All data are integrated on to a single graph schema centered around the non-redundant set of biological objects that are shared by each source. This integration results in a highly connected graph structure that provides a more complete picture of the known context of a given object that cannot be determined from any one source. Currently, Biozon integrates roughly 2 million protein sequences, 42 million DNA or RNA sequences, 32 000 protein structures, 150 000 interactions and more from sources such as GenBank, UniProt, Protein Data Bank (PDB) and BIND. Biozon augments source data with locally derived data such as 5 billion pairwise protein alignments and 8 million structural alignments. The user may form complex cross-type queries on the graph structure, add similarity relations to form fuzzy queries and rank the results based on analysis of the edge structure similar to Google PageRank, online at Biozon.org.
引用
收藏
页码:D235 / D242
页数:8
相关论文
共 23 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BIND - The Biomolecular Interaction Network Database [J].
Bader, GD ;
Donaldson, I ;
Wolting, C ;
Ouellette, BFF ;
Pawson, T ;
Hogue, CWV .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :242-245
[3]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[4]  
Baker P G, 1998, Proc Int Conf Intell Syst Mol Biol, V6, P25
[5]   GenBank [J].
Benson, DA ;
Boguski, MS ;
Lipman, DJ ;
Ostell, J ;
Ouellette, BFF ;
Rapp, BA ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :12-17
[6]  
CHEN J, 2003, BIOINFORMATICS, P147
[7]   K2/Kleisli and GUS: Experiments in integrated access to genomic data sources [J].
Davidson, SB ;
Crabtree, J ;
Brunk, BP ;
Schug, J ;
Tannen, V ;
Overton, GC ;
Stoeckert, CJ .
IBM SYSTEMS JOURNAL, 2001, 40 (02) :512-531
[8]  
ETZOLD T, 1993, COMPUT APPL BIOSCI, V9, P49
[9]   The PIR-International protein sequence database [J].
George, DG ;
Barker, WC ;
Mewes, HW ;
Pfeiffer, F ;
Tsugita, A .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :17-20
[10]   DiscoveryLink: A system for integrated access to life sciences data sources [J].
Haas, LM ;
Schwarz, PM ;
Kodali, P ;
Kotlar, E ;
Rice, JE ;
Swope, WC .
IBM SYSTEMS JOURNAL, 2001, 40 (02) :489-511