Dynameomics: a multi-dimensional analysis-optimized database for dynamic protein data

被引:36
作者
Kehl, Catherine [1 ]
Simms, Andrew M. [1 ]
Toofanny, Rudesh D. [2 ]
Daggett, Valerie [1 ,2 ]
机构
[1] Univ Washington, Biomed & Hlth Informat Program, Seattle, WA 98195 USA
[2] Univ Washington, Dept Bioengn, Seattle, WA 98195 USA
关键词
data warehouse; Dynameomics; molecular dynamics; protein dynamics; OLAP;
D O I
10.1093/protein/gzn015
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Dynameomics project is our effort to characterize the native-state dynamics and folding/unfolding pathways of representatives of all known protein folds by way of molecular dynamics simulations, as described by Beck et al. (in Protein Eng. Des. Select., the first paper in this series). The data produced by these simulations are highly multidimensional in structure and multi-terabytes in size. Both of these features present significant challenges for storage, retrieval and analysis. For optimal data modeling and flexibility, we needed a platform that supported both multidimensional indices and hierarchical relationships between related types of data and that could be integrated within our data warehouse, as described in the accompanying paper directly preceding this one. For these reasons, we have chosen On-line Analytical Processing (OLAP), a multi-dimensional analysis optimized database, as an analytical platform for these data. OLAP is a mature technology in the financial sector, but it has not been used extensively for scientific analysis. Our project is further more unusual for its focus on the multidimensional and analytical capabilities of OLAP rather than its aggregation capacities. The dimensional data model and hierarchies are very flexible. The query language is concise for complex analysis and rapid data retrieval. OLAP shows great promise for the dynamic protein analysis for bioengineering and biomedical applications. In addition, OLAP may have similar potential for other scientific and engineering applications involving large and complex datasets.
引用
收藏
页码:379 / 386
页数:8
相关论文
共 9 条
[1]  
Beck D.A.C., 2000, LUCEM MOL MECH
[2]   Methods for molecular dynamics simulations of protein folding/unfolding in solution [J].
Beck, DAC ;
Daggett, V .
METHODS, 2004, 34 (01) :112-120
[3]   Dynameomics: mass annotation of protein dynamics and unfolding in water by high-throughput atomistic molecular dynamics simulations [J].
Beck, David A. C. ;
Jonsson, Amanda L. ;
Schaeffer, R. Dustin ;
Scott, Kathryn A. ;
Day, Ryan ;
Toofanny, Rudesh D. ;
Alonso, Darwin O. V. ;
Daggett, Valerie .
PROTEIN ENGINEERING DESIGN & SELECTION, 2008, 21 (06) :353-368
[4]   Towards data warehousing and mining of protein unfolding simulation data [J].
Berrar D. ;
Stahl F. ;
Silva C. ;
Rodrigues J.R. ;
Brito R.M.M. ;
Dubitzky W. .
Journal of Clinical Monitoring and Computing, 2005, 19 (4-5) :307-317
[5]  
CODD E, 1993, PROVIDING OLAP ONLIN
[6]   A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary [J].
Day, R ;
Beck, DAC ;
Armen, RS ;
Daggett, V .
PROTEIN SCIENCE, 2003, 12 (10) :2150-2160
[7]   UCSF chimera - A visualization system for exploratory research and analysis [J].
Pettersen, EF ;
Goddard, TD ;
Huang, CC ;
Couch, GS ;
Greenblatt, DM ;
Meng, EC ;
Ferrin, TE .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2004, 25 (13) :1605-1612
[8]   Dynameomics: design of a computational lab workflow and scientific data repository for protein simulations [J].
Simms, Andrew M. ;
Toofanny, Rudesh D. ;
Kehl, Catherine ;
Benson, Noah C. ;
Daggett, Valerie .
PROTEIN ENGINEERING DESIGN & SELECTION, 2008, 21 (06) :369-377
[9]  
*WOLFR RES INC, 2005, MATHEMATICA