Object-oriented parsing of biological databases with Python']Python

被引:6
作者
Ramu, C [1 ]
Gemünd, C [1 ]
Gibson, TJ [1 ]
机构
[1] European Mol Biol Lab, Heidelberg, Germany
关键词
D O I
10.1093/bioinformatics/16.7.628
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: While database activities in the biological area are increasing rapidly, rather little is done in the area of parsing them in a simple and object-oriented way. Results: We present here an elegant, simple yet powerful way of parsing biological flat-file databases. We have taken EMBL, SWISSPROT and GENBANK as examples. EMBL and SWISS-PROT do not differ much in the format structure. GENBANK has a very different format structure than EMBL and SWISS-PROT Extracting the desired fields in an entry (for example a sub-sequence with an associated feature) for Eater analysis is a constant need in the biological sequence-analysis community: this is illustrated with tools to make new splice-site databases. The interface to the parser is abstract in the sense that the access to all the databases is independent from their different formats, since parsing instructions are hidden.
引用
收藏
页码:628 / 638
页数:11
相关论文
共 12 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], METHOD ENZYMOL
[3]  
Bairoch A, 1997, J MOL MED-JMM, V75, P312
[4]   Molecular Biology Database List [J].
Burks, C .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :1-9
[5]   A COMPREHENSIVE SET OF SEQUENCE-ANALYSIS PROGRAMS FOR THE VAX [J].
DEVEREUX, J ;
HAEBERLI, P ;
SMITHIES, O .
NUCLEIC ACIDS RESEARCH, 1984, 12 (01) :387-395
[6]  
Etzold T, 1996, METHOD ENZYMOL, V266, P114
[7]   Swissknife - 'lazy parsing' of SWISS-PROT entries [J].
Hermjakob, H ;
Fleischmann, W ;
Apweiler, R .
BIOINFORMATICS, 1999, 15 (09) :771-772
[8]  
LUTZ M., 1996, PROGRAMMING PYTHON
[9]   SPEM: a parser for EMBL style flat file database entries [J].
Pocock, MR ;
Hubbard, T ;
Birney, E .
BIOINFORMATICS, 1998, 14 (09) :823-824
[10]   The EMBL Nucleotide Sequence Database [J].
Stoesser, G ;
Tuli, MA ;
Lopez, R ;
Sterk, P .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :18-24