KNIME-CDK: Workflow-driven cheminformatics

被引:106
作者
Beisken, Stephan [1 ]
Meinl, Thorsten [2 ]
Wiswedel, Bernd [3 ]
de Figueiredo, Luis F. [1 ]
Berthold, Michael [2 ]
Steinbeck, Christoph [1 ]
机构
[1] EBI, EMBL, Cambridge, England
[2] Univ Konstanz, Nycomed Chair Bioinformat & Informat Min, Constance, Germany
[3] KNIME Com AG, CH-8005 Zurich, Switzerland
来源
BMC BIOINFORMATICS | 2013年 / 14卷
关键词
Cheminformatics; Workflows; Data integration; Software library; GENERATION SEQUENCING DATA;
D O I
10.1186/1471-2105-14-257
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Cheminformaticians have to routinely process and analyse libraries of small molecules. Among other things, that includes the standardization of molecules, calculation of various descriptors, visualisation of molecular structures, and downstream analysis. For this purpose, scientific workflow platforms such as the Konstanz Information Miner can be used if provided with the right plug-in. A workflow-based cheminformatics tool provides the advantage of ease-of-use and interoperability between complementary cheminformatics packages within the same framework, hence facilitating the analysis process. Results: KNIME-CDK comprises functions for molecule conversion to/from common formats, generation of signatures, fingerprints, and molecular properties. It is based on the Chemistry Development Toolkit and uses the Chemical Markup Language for persistence. A comparison with the cheminformatics plug-in RDKit shows that KNIME-CDK supports a similar range of chemical classes and adds new functionality to the framework. We describe the design and integration of the plug-in, and demonstrate the usage of the nodes on ChEBI, a library of small molecules of biological interest. Conclusions: KNIME-CDK is an open-source plug-in for the Konstanz Information Miner, a free workflow platform. KNIME-CDK is build on top of the open-source Chemistry Development Toolkit and allows for efficient cross-vendor structural cheminformatics. Its ease-of-use and modularity enables researchers to automate routine tasks and data analysis, bringing complimentary cheminformatics functionality to the workflow environment.
引用
收藏
页数:4
相关论文
共 19 条
[1]  
[Anonymous], 2007, STUDIES CLASSIFICATI
[2]   myExperiment: a repository and social network for the sharing of bioinformatics workflows [J].
Goble, Carole A. ;
Bhagat, Jiten ;
Aleksejevs, Sergejs ;
Cruickshank, Don ;
Michaelides, Danius ;
Newman, David ;
Borkum, Mark ;
Bechhofer, Sean ;
Roos, Marco ;
Li, Peter ;
De Roure, David .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W677-W682
[3]   The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 [J].
Hastings, Janna ;
de Matos, Paula ;
Dekker, Adriano ;
Ennis, Marcus ;
Harsha, Bhavana ;
Kale, Namrata ;
Muthukrishnan, Venkatesh ;
Owen, Gareth ;
Turner, Steve ;
Williams, Mark ;
Steinbeck, Christoph .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D456-D463
[4]   Extending KNIME for next-generation sequencing data analysis [J].
Jagla, Bernd ;
Wiswedel, Bernd ;
Coppee, Jean-Yves .
BIOINFORMATICS, 2011, 27 (20) :2907-2909
[5]  
KNIME, KNIME COMM SIT
[6]  
KNIME, KNIME PROF OP SOURC
[7]   JChemPaint - Using the collaborative forces of the Internet to develop a free editor for 2D chemical structures [J].
Krause, S ;
Willighagen, E ;
Steinbeck, C .
MOLECULES, 2000, 5 (01) :93-98
[8]   Chemical markup, XML, and the world wide web. 7. CMLSpect, an XML vocabulary for spectral data [J].
Kuhn, Stefan ;
Helmus, Tobias ;
Lancashire, Robert J. ;
Murray-Rust, Peter ;
Rzepa, Henry S. ;
Steinbeck, Christoph ;
Willighagen, Egon L. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (06) :2015-2034
[9]  
Landrum G, Rdkit: Open-source cheminformatics
[10]   Visual Characterization and Diversity Quantification of Chemical Libraries: 1. Creation of Delimited Reference Chemical Subspaces [J].
Le Guilloux, Vincent ;
Colliandre, Lionel ;
Bourg, Stephane ;
Guenegou, Guillaume ;
Dubois-Chevalier, Julie ;
Morin-Allory, Luc .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (08) :1762-1774