XML schemas for common bioinformatic data types and their application in workflow systems

被引:29
作者
Seibel, Philipp N. [1 ]
Krueger, Jan
Hartmeier, Sven
Schwarzer, Knut
Lowenthal, Kai
Mersch, Henning
Dandekar, Thomas
Giegerich, Robert
机构
[1] Univ Wurzburg, Dept Bioinformat, Bioctr, Wurzburg, Germany
[2] Univ Bielefeld, Fac Technol, Bioinformat Grp, Pract Comp Sci Dept, D-4800 Bielefeld, Germany
[3] Univ Gottingen, Dept Bioinformat, UKG, D-3400 Gottingen, Germany
[4] Res Ctr Julich, Distributed Syst & Grid Comp, Cent Inst Appl Math, Julich, Germany
关键词
D O I
10.1186/1471-2105-7-490
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data - therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results: Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net. Conclusion: The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.
引用
收藏
页数:11
相关论文
共 41 条
[1]  
Abouelhoda M. I., 2004, Journal of Discrete Algorithms, V2, P53, DOI 10.1016/S1570-8667(03)00065-0
[2]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[3]   Designing and executing scientific workflows with a programmable integrator [J].
Chagoyen, M ;
Kurul, ME ;
De-Alarcón, PA ;
Carazo, JM ;
Gupta, A .
BIOINFORMATICS, 2004, 20 (13) :2092-2100
[4]  
Cheung Kei-Hoi, 2004, Appl Bioinformatics, V3, P253, DOI 10.2165/00822942-200403040-00007
[5]   INCLUSive: a web portal and service registry for microarray and regulatory sequence analysis [J].
Coessens, B ;
Thijs, G ;
Aerts, S ;
Marchal, K ;
De Smet, F ;
Engelen, K ;
Glenisson, P ;
Moreau, Y ;
Mathys, J ;
De Moor, B .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3468-3470
[6]  
Hanisch Daniel, 2002, In Silico Biology, V2, P313
[7]   The HUPOPSI's Molecular Interaction format - a community standard for the representation of protein interaction data [J].
Hermjakob, H ;
Montecchi-Palazzi, L ;
Bader, G ;
Wojcik, R ;
Salwinski, L ;
Ceol, A ;
Moore, S ;
Orchard, S ;
Sarkans, U ;
von Mering, C ;
Roechert, B ;
Poux, S ;
Jung, E ;
Mersch, H ;
Kersey, P ;
Lappe, M ;
Li, YX ;
Zeng, R ;
Rana, D ;
Nikolski, M ;
Husi, H ;
Brun, C ;
Shanker, K ;
Grant, SGN ;
Sander, C ;
Bork, P ;
Zhu, WM ;
Pandey, A ;
Brazma, A ;
Jacq, B ;
Vidal, M ;
Sherman, D ;
Legrain, P ;
Cesareni, G ;
Xenarios, L ;
Eisenberg, D ;
Steipe, B ;
Hogue, C ;
Apweiler, R .
NATURE BIOTECHNOLOGY, 2004, 22 (02) :177-183
[8]   Local similarity in RNA secondary structures [J].
Höchsmann, M ;
Töller, T ;
Giegerich, R ;
Kurtz, S .
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, :159-168
[9]   Secondary structure prediction for aligned RNA sequences [J].
Hofacker, IL ;
Fekete, M ;
Stadler, PF .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 319 (05) :1059-1066
[10]  
HOFACKER IL, 1994, MONATSCHEFTE CHEM, V125