Biological sequences integrated: A relational database approach

被引:4
作者
Bergholz, A
Heymann, S
Schenk, JA
Freytag, JC
机构
[1] Humboldt Univ, Inst Comp Sci, D-10099 Berlin, Germany
[2] Max Delbruck Ctr Mol Med, MDC, D-13125 Berlin, Germany
[3] Univ Potsdam, Inst Biochem & Biol, Dept Biotechnol, Golm, Germany
关键词
D O I
10.1023/A:1011958524279
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Over the last decade the modeling and the storage of biological data has been a topic of wide interest for scientists dealing with biological and biomedical research. Currently most data is still stored in text files which leads to data redundancies and file chaos. In this paper we show how to use relational modeling techniques and relational database technology for modeling and storing biological sequence data, i.e. for data maintained in collections like EMBL or SWISS-PROT to better serve the needs for these application domains. For this reason we propose a two step approach. First, we model the structure (and therefore the meaning of the) data using an Entity-Relationship approach. The ER model leads to a clean design of a relational database schema for storing and retrieving the DNA and protein data extracted from various sources. Our approach provides the clean basis for building complex biological applications that are more amenable to changes and software ports than their file-base counterparts.
引用
收藏
页码:145 / 159
页数:15
相关论文
共 18 条
[1]  
Aho AV., 1988, AWK PROGRAMMING LANG
[2]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :49-54
[3]   The PIR-International Protein Sequence Database [J].
Barker, WC ;
Garavelli, JS ;
McGarvey, PB ;
Marzec, CR ;
Orcutt, BC ;
Srinivasarao, GY ;
Yeh, LSL ;
Ledley, RS ;
Mewes, HW ;
Pfeiffer, F ;
Tsugita, A ;
Wu, C .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :39-43
[4]   GenBank [J].
Benson, DA ;
Boguski, MS ;
Lipman, DJ ;
Ostell, J ;
Ouellette, BFF ;
Rapp, BA ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :12-17
[5]   Sequence comparison using a relational database approach [J].
Bergholz, A ;
Heymann, S ;
Schenk, JA ;
Freytag, JC .
IDEAS '97 - INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 1997, :126-131
[6]   Databases and software for the analysis of mutations in the human p53 gene, the human hprt gene and both the lacI and lacZ gene in transgenic rodents [J].
Cariello, NF ;
Douglas, GR ;
Dycaico, MJ ;
Gorelick, NJ ;
Provost, GS ;
Soussi, T .
NUCLEIC ACIDS RESEARCH, 1997, 25 (01) :136-137
[7]  
CONTRINO S, 2000, SWISS PROT GOES ORAC
[8]  
DATE CJ, 1995, SYSTEM PROGRAMMING S
[9]  
Kabat E. A., 1991, NIH PUBLICATION
[10]   The Genome Sequence DataBase (GSDB): Meeting the challenge of genomic sequencing [J].
Keen, G ;
Burton, J ;
Crowley, D ;
Dickinson, E ;
EspinosaLujan, A ;
Franks, E ;
Harger, C ;
Manning, M ;
March, S ;
McLeod, M ;
ONeill, J ;
Power, A ;
Pumilia, M ;
Reinert, R ;
Rider, D ;
Rohrlich, J ;
Schwertfeger, J ;
Smyth, L ;
Thayer, N ;
Troup, C ;
Fields, C .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :13-16