The Gaggle: An open-source software system for integrating bioinformatics software and data sources

被引:113
作者
Shannon, Paul T.
Reiss, David J.
Bonneau, Richard
Baliga, Nitin S.
机构
[1] Inst Syst Biol, Seattle, WA 98103 USA
[2] New York Univ, Dept Biol, New York, NY 10003 USA
基金
美国国家科学基金会;
关键词
D O I
10.1186/1471-2105-7-176
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Systems biologists work with many kinds of data, from many different sources, using a variety of software tools. Each of these tools typically excels at one type of analysis, such as of microarrays, of metabolic networks and of predicted protein structure. A crucial challenge is to combine the capabilities of these (and other forthcoming) data resources and tools to create a data exploration and analysis environment that does justice to the variety and complexity of systems biology data sets. A solution to this problem should recognize that data types, formats and software in this high throughput age of biology are constantly changing. Results: In this paper we describe the Gaggle -a simple, open-source Java software environment that helps to solve the problem of software and database integration. Guided by the classic software engineering strategy of separation of concerns and a policy of semantic flexibility, it integrates existing popular programs and web resources into a user-friendly, easily-extended environment. We demonstrate that four simple data types (names, matrices, networks, and associative arrays) are sufficient to bring together diverse databases and software. We highlight some capabilities of the Gaggle with an exploration of Helicobacter pylori pathogenesis genes, in which we identify a putative ricin-like protein -a discovery made possible by simultaneous data exploration using a wide range of publicly available data and a variety of popular bioinformatics software tools. Conclusion: We have integrated diverse databases (for example, KEGG, BioCyc, String) and software (Cytoscape, DataMatrixViewer, R statistical environment, and TIGR Microarray Expression Viewer). Through this loose coupling of diverse software and databases the Gaggle enables simultaneous exploration of experimental data (mRNA and protein abundance, protein-protein and protein-DNA interactions), functional associations (operon, chromosomal proximity, phylogenetic pattern), metabolic pathways (KEGG) and Pubmed abstracts (STRING web resource), creating an exploratory environment useful to 'web browser and spreadsheet biologists', to statistically savvy computational biologists, and those in between. The Gaggle uses Java RMI and Java Web Start technologies and can be found at http:// gaggle. systemsbiology. net.
引用
收藏
页数:13
相关论文
共 26 条
  • [1] Ball CA, 2005, NUCLEIC ACIDS RES, V33, pD580
  • [2] Prolinks: a database of protein functional linkages derived from coevolution
    Bowers, PM
    Pellegrini, M
    Thompson, MJ
    Fierro, J
    Yeates, TO
    Eisenberg, D
    [J]. GENOME BIOLOGY, 2004, 5 (05)
  • [3] Automated prediction of CASP-5 structures using the Robetta server
    Chivian, D
    Kim, DE
    Malmström, L
    Bradley, P
    Robertson, T
    Murphy, P
    Strauss, CEM
    Bonneau, R
    Rohl, CA
    Baker, D
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 : 524 - 533
  • [4] COST AK, 1999, J BACTERIOL, V1871, P3710
  • [5] caCORE: A common infrastructure for cancer informatics
    Covitz, PA
    Hartel, F
    Schaefer, C
    De Coronado, S
    Fragoso, G
    Sahni, H
    Gustafson, S
    Buetow, KH
    [J]. BIOINFORMATICS, 2003, 19 (18) : 2404 - 2412
  • [6] Colonization of gnotobiotic piglets by Helicobacter pylori deficient in two flagellin genes
    Eaton, KA
    Suerbaum, S
    Josenhans, C
    Krakowka, S
    [J]. INFECTION AND IMMUNITY, 1996, 64 (07) : 2445 - 2448
  • [7] Eckart J. Dana, 2003, OMICS A Journal of Integrative Biology, V7, P79, DOI 10.1089/153623103322006661
  • [8] Systems biology experimental design - Considerations for building predictive gene regulatory network models for prokaryotic systems
    Facciotti, MT
    Bonneau, R
    Hood, L
    Baliga, NS
    [J]. CURRENT GENOMICS, 2004, 5 (07) : 527 - 544
  • [9] Bioconductor: open software development for computational biology and bioinformatics
    Gentleman, RC
    Carey, VJ
    Bates, DM
    Bolstad, B
    Dettling, M
    Dudoit, S
    Ellis, B
    Gautier, L
    Ge, YC
    Gentry, J
    Hornik, K
    Hothorn, T
    Huber, W
    Iacus, S
    Irizarry, R
    Leisch, F
    Li, C
    Maechler, M
    Rossini, AJ
    Sawitzki, G
    Smith, C
    Smyth, G
    Tierney, L
    Yang, JYH
    Zhang, JH
    [J]. GENOME BIOLOGY, 2004, 5 (10)
  • [10] The systems biology markup language (SBML):: a medium for representation and exchange of biochemical network models
    Hucka, M
    Finney, A
    Sauro, HM
    Bolouri, H
    Doyle, JC
    Kitano, H
    Arkin, AP
    Bornstein, BJ
    Bray, D
    Cornish-Bowden, A
    Cuellar, AA
    Dronov, S
    Gilles, ED
    Ginkel, M
    Gor, V
    Goryanin, II
    Hedley, WJ
    Hodgman, TC
    Hofmeyr, JH
    Hunter, PJ
    Juty, NS
    Kasberger, JL
    Kremling, A
    Kummer, U
    Le Novère, N
    Loew, LM
    Lucio, D
    Mendes, P
    Minch, E
    Mjolsness, ED
    Nakayama, Y
    Nelson, MR
    Nielsen, PF
    Sakurada, T
    Schaff, JC
    Shapiro, BE
    Shimizu, TS
    Spence, HD
    Stelling, J
    Takahashi, K
    Tomita, M
    Wagner, J
    Wang, J
    [J]. BIOINFORMATICS, 2003, 19 (04) : 524 - 531