The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data

被引:87
作者
Wilks, Christopher [1 ]
Cline, Melissa S. [2 ]
Weiler, Erich [2 ]
Diehkans, Mark [2 ]
Craft, Brian [2 ]
Martin, Christy [3 ]
Murphy, Daniel [3 ]
Pierce, Howdy [4 ]
Black, John [4 ]
Nelson, Donavan [4 ]
Litzinger, Brian [3 ]
Hatton, Thomas [3 ]
Maltbie, Lori [3 ]
Ainsworth, Michael [3 ]
Allen, Patrick [3 ]
Rosewood, Linda [1 ]
Mitchell, Elizabeth [1 ]
Smith, Bradley [5 ]
Warner, Jim [5 ]
Groboske, John [1 ]
Telc, Haifang [1 ]
Wilson, Daniel [1 ]
Sanford, Brian [1 ]
Schmidt, Hannes [1 ]
Haussler, David [2 ]
Maltbie, Daniel [3 ]
机构
[1] Univ Calif Santa Cruz, Sch Engn, Santa Cruz, CA 95064 USA
[2] Univ Calif Santa Cruz, Sch Engn, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
[3] Annai Syst Inc, Carlsbad, CA 92011 USA
[4] Cardinal Peak LLC, Lafayette, CO 80026 USA
[5] Univ Calif Santa Cruz, Informat Technol Serv, Santa Cruz, CA 95064 USA
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2014年
基金
美国国家卫生研究院;
关键词
D O I
10.1093/database/bau093
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Cancer Genomics Hub (CGHub) is the online repository of the sequencing programs of the National Cancer Institute (NCI), including The Cancer Genomics Atlas (TCGA), the Cancer Cell Line Encyclopedia (CCLE) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) projects, with data from 25 different types of cancer. The CGHub currently contains > 1.4 PB of data, has grown at an average rate of 50 TB a month and serves > 100 TB per week. The architecture of CGHub is designed to support bulk searching and downloading through a Web-accessible application programming interface, enforce patient genome confidentiality in data storage and transmission and optimize for efficiency in access and transfer. In this article, we describe the design of these three components, present performance results for our transfer protocol, GeneTorrent, and finally report on the growth of the system in terms of data stored and transferred, including estimated limits on the current architecture. Our experienced-based estimates suggest that centralizing storage and computational resources is more efficient than wide distribution across many satellite labs.
引用
收藏
页数:10
相关论文
共 18 条
[1]   A Comparative Study of the Performance and Security Issues of AES and RSA Cryptography [J].
Al Hasib, Abdullah ;
Haque, Abul Ahsan Md. Mahmudul .
THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 2, PROCEEDINGS, 2008, :505-510
[2]  
[Anonymous], 2005, Proceedings of the 2005 ACM/IEEE conference on Supercomputing, DOI DOI 10.1109/SC.2005.72
[3]  
Cohen Bram, 2003, WORKSH EC PEER TO PE, V6, P68
[4]  
Dongarra J., 2007, CTWATCH Q, V3, P1
[5]   UDT: UDP-based data transfer for high-speed wide area networks [J].
Gu, Yunhong ;
Grossman, Robert L. .
COMPUTER NETWORKS, 2007, 51 (07) :1777-1799
[6]   Assembly of the working draft of the human genome with GigAssembler [J].
Kent, WJ ;
Haussler, D .
GENOME RESEARCH, 2001, 11 (09) :1541-1548
[7]  
Khalil-Hani M, 2010, UKSIM INT CONF COMP, P374, DOI 10.1109/ISMS.2010.89
[8]  
Langley A., 2010, VEL WEB PERF OP C
[9]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[10]  
Liogkas N., 2006, P 5 INT WORKSH PEER