Dynameomics: design of a computational lab workflow and scientific data repository for protein simulations

被引:41
作者
Simms, Andrew M. [1 ]
Toofanny, Rudesh D. [2 ]
Kehl, Catherine [1 ]
Benson, Noah C. [1 ]
Daggett, Valerie [1 ,2 ]
机构
[1] Univ Washington, Biomed & Hlth Informat Program, Seattle, WA 98195 USA
[2] Univ Washington, Dept Bioengn, Seattle, WA 98195 USA
关键词
data warehouse; database; Dynameomics; OLAP; protein dynamics;
D O I
10.1093/protein/gzn012
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Dynameomics is a project to investigate and catalog the native-state dynamics and thermal unfolding pathways of representatives of all protein folds using solvated molecular dynamics simulations, as described in the preceding paper. Here we introduce the design of the molecular dynamics data warehouse, a scalable, reliable repository that houses simulation data that vastly simplifies management and access. In the succeeding paper, we describe the development of a complementary multidimensional database. A single protein unfolding or native-state simulation can take weeks to months to complete, and produces gigabytes of coordinate and analysis data. Mining information from over 3000 completed simulations is complicated and time-consuming. Even the simplest queries involve writing intricate programs that must be built from low-level file system access primitives and include significant logic to correctly locate and parse data of interest. As a result, programs to answer questions that require data from hundreds of simulations are very difficult to write. Thus, organization and access to simulation data have been major obstacles to the discovery of new knowledge in the Dynameomics project. This repository is used internally and is the foundation of the Dynameomics portal site http://www.dynameomics.org. By organizing simulation data into a scalable, manageable and accessible form, we can begin to address substantial questions that move us closer to solving biomedical and bioengineering problems.
引用
收藏
页码:369 / 377
页数:9
相关论文
共 35 条
[1]  
[Anonymous], 1998, DATA WAREHOUSE LIFEC
[2]  
Beck D.A.C., 2000, LUCEM MOL MECH
[3]   Methods for molecular dynamics simulations of protein folding/unfolding in solution [J].
Beck, DAC ;
Daggett, V .
METHODS, 2004, 34 (01) :112-120
[4]  
BECK DAC, 2007, IQ QUEUING SYSTEM
[5]  
BECK DAC, PROTEIN ENG DES SEL, V21, P353
[6]  
BECK DAC, 2008, P NATL ACAD IN PRESS
[7]  
BENSON NC, 2008, P NATL ACAD IN PRESS
[8]   Announcing the worldwide Protein Data Bank [J].
Berman, H ;
Henrick, K ;
Nakamura, H .
NATURE STRUCTURAL BIOLOGY, 2003, 10 (12) :980-980
[9]   The Protein Data Bank [J].
Berman, HM ;
Battistuz, T ;
Bhat, TN ;
Bluhm, WF ;
Bourne, PE ;
Burkhardt, K ;
Iype, L ;
Jain, S ;
Fagan, P ;
Marvin, J ;
Padilla, D ;
Ravichandran, V ;
Schneider, B ;
Thanki, N ;
Weissig, H ;
Westbrook, JD ;
Zardecki, C .
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2002, 58 :899-907
[10]   Towards data warehousing and mining of protein unfolding simulation data [J].
Berrar D. ;
Stahl F. ;
Silva C. ;
Rodrigues J.R. ;
Brito R.M.M. ;
Dubitzky W. .
Journal of Clinical Monitoring and Computing, 2005, 19 (4-5) :307-317