A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly

被引:111
作者
Blankenberg, Daniel [1 ]
Taylor, James [1 ]
Schenck, Ian [1 ]
He, Jianbin [1 ]
Zhang, Yi [1 ]
Ghent, Matthew [1 ]
Veeraraghavan, Narayanan [1 ]
Albert, Istvan [1 ]
Miller, Webb [1 ]
Makova, Kateryna D. [1 ]
Hardison, Ross C. [1 ]
Nekrutenko, Anton [1 ]
机构
[1] Penn State Univ, Huck Inst Life Sci, Ctr Comparat Genom & Bioinformat, University Pk, PA 16802 USA
关键词
D O I
10.1101/gr.5578007
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The standardization and sharing of data and tools are the biggest challenges of large collaborative projects such as the Encyclopedia of DNA Elements ( ENCODE). Here we describe a compact Web application, Galaxy2(ENCODE), that effectively addresses these issues. It provides an intuitive interface for the deposition and access of data, and features a vast number of analysis tools including operations on genomic intervals, utilities for manipulation of multiple sequence alignments, and molecular evolution algorithms. By providing a direct link between data and analysis tools, Galaxy2(ENCODE) allows addressing biological questions that are beyond the reach of existing software. We use Galaxy2(ENCODE) to show that the ENCODE regions contain >2000 unannotated transcripts under strong purifying selection that are likely functional. We also show that the ENCODE regions are representative of the entire genome by estimating the rate of nucleotide substitution and comparing it to published data. Although each of these analyses is complex, none takes more than 15 min from beginning to end. Finally, we demonstrate how new tools can be added to Galaxy2(ENCODE) with almost no effort. Every section of the manuscript is supplemented with QuickTime screencasts. Galaxy2(ENCODE) and the screencasts can be accessed at http://g2.bx.psu.edu.
引用
收藏
页码:960 / 964
页数:5
相关论文
共 16 条
[1]   Male-biased mutation rate and divergence in autosomal, Z-linked and W-linked introns of chicken and turkey [J].
Axelsson, E ;
Smith, NGC ;
Sundström, H ;
Berlin, S ;
Ellegren, H .
MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (08) :1538-1547
[2]   Conservation and functional significance of gene topology in the genome of Caenorhabditis elegans [J].
Chen, NS ;
Stein, LD .
GENOME RESEARCH, 2006, 16 (05) :606-617
[3]   Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution [J].
Cheng, J ;
Kapranov, P ;
Drenkow, J ;
Dike, S ;
Brubaker, S ;
Patel, S ;
Long, J ;
Stern, D ;
Tammana, H ;
Helt, G ;
Sementchenko, V ;
Piccolboni, A ;
Bekiranov, S ;
Bailey, DK ;
Ganesh, M ;
Ghosh, S ;
Bell, I ;
Gerhard, DS ;
Gingeras, TR .
SCIENCE, 2005, 308 (5725) :1149-1154
[4]   The ENCODE (ENCyclopedia of DNA elements) Project [J].
Feingold, EA ;
Good, PJ ;
Guyer, MS ;
Kamholz, S ;
Liefer, L ;
Wetterstrand, K ;
Collins, FS ;
Gingeras, TR ;
Kampa, D ;
Sekinger, EA ;
Cheng, J ;
Hirsch, H ;
Ghosh, S ;
Zhu, Z ;
Pate, S ;
Piccolboni, A ;
Yang, A ;
Tammana, H ;
Bekiranov, S ;
Kapranov, P ;
Harrison, R ;
Church, G ;
Struhl, K ;
Ren, B ;
Kim, TH ;
Barrera, LO ;
Qu, C ;
Van Calcar, S ;
Luna, R ;
Glass, CK ;
Rosenfeld, MG ;
Guigo, R ;
Antonarakis, SE ;
Birney, E ;
Brent, M ;
Pachter, L ;
Reymond, A ;
Dermitzakis, ET ;
Dewey, C ;
Keefe, D ;
Denoeud, F ;
Lagarde, J ;
Ashurst, J ;
Hubbard, T ;
Wesselink, JJ ;
Castelo, R ;
Eyras, E ;
Myers, RM ;
Sidow, A ;
Batzoglou, S .
SCIENCE, 2004, 306 (5696) :636-640
[5]   Galaxy: A platform for interactive large-scale genome analysis [J].
Giardine, B ;
Riemer, C ;
Hardison, RC ;
Burhans, R ;
Elnitski, L ;
Shah, P ;
Zhang, Y ;
Blankenberg, D ;
Albert, I ;
Taylor, J ;
Miller, W ;
Kent, WJ ;
Nekrutenko, A .
GENOME RESEARCH, 2005, 15 (10) :1451-1455
[6]   Genome sequence of the Brown Norway rat yields insights into mammalian evolution [J].
Gibbs, RA ;
Weinstock, GM ;
Metzker, ML ;
Muzny, DM ;
Sodergren, EJ ;
Scherer, S ;
Scott, G ;
Steffen, D ;
Worley, KC ;
Burch, PE ;
Okwuonu, G ;
Hines, S ;
Lewis, L ;
DeRamo, C ;
Delgado, O ;
Dugan-Rocha, S ;
Miner, G ;
Morgan, M ;
Hawes, A ;
Gill, R ;
Holt, RA ;
Adams, MD ;
Amanatides, PG ;
Baden-Tillson, H ;
Barnstead, M ;
Chin, S ;
Evans, CA ;
Ferriera, S ;
Fosler, C ;
Glodek, A ;
Gu, ZP ;
Jennings, D ;
Kraft, CL ;
Nguyen, T ;
Pfannkoch, CM ;
Sitter, C ;
Sutton, GG ;
Venter, JC ;
Woodage, T ;
Smith, D ;
Lee, HM ;
Gustafson, E ;
Cahill, P ;
Kana, A ;
Doucette-Stamm, L ;
Weinstock, K ;
Fechtel, K ;
Weiss, RB ;
Dunn, DM ;
Green, ED .
NATURE, 2004, 428 (6982) :493-521
[7]   Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution [J].
Hardison, RC ;
Roskin, KM ;
Yang, S ;
Diekhans, M ;
Kent, WJ ;
Weber, R ;
Elnitski, L ;
Li, J ;
O'Connor, M ;
Kolbe, D ;
Schwartz, S ;
Furey, TS ;
Whelan, S ;
Goldman, N ;
Smit, A ;
Miller, W ;
Chiaromonte, F ;
Haussler, D .
GENOME RESEARCH, 2003, 13 (01) :13-26
[8]   Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22 [J].
Kampa, D ;
Cheng, J ;
Kapranov, P ;
Yamanaka, M ;
Brubaker, S ;
Cawley, S ;
Drenkow, J ;
Piccolboni, A ;
Bekiranov, S ;
Helt, G ;
Tammana, H ;
Gingeras, TR .
GENOME RESEARCH, 2004, 14 (03) :331-342
[9]   Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences [J].
King, DC ;
Taylor, J ;
Elnitski, L ;
Chiaromonte, F ;
Miller, W ;
Hardison, RC .
GENOME RESEARCH, 2005, 15 (08) :1051-1060
[10]   Genome sequence, comparative analysis and haplotype structure of the domestic dog [J].
Lindblad-Toh, K ;
Wade, CM ;
Mikkelsen, TS ;
Karlsson, EK ;
Jaffe, DB ;
Kamal, M ;
Clamp, M ;
Chang, JL ;
Kulbokas, EJ ;
Zody, MC ;
Mauceli, E ;
Xie, XH ;
Breen, M ;
Wayne, RK ;
Ostrander, EA ;
Ponting, CP ;
Galibert, F ;
Smith, DR ;
deJong, PJ ;
Kirkness, E ;
Alvarez, P ;
Biagi, T ;
Brockman, W ;
Butler, J ;
Chin, CW ;
Cook, A ;
Cuff, J ;
Daly, MJ ;
DeCaprio, D ;
Gnerre, S ;
Grabherr, M ;
Kellis, M ;
Kleber, M ;
Bardeleben, C ;
Goodstadt, L ;
Heger, A ;
Hitte, C ;
Kim, L ;
Koepfli, KP ;
Parker, HG ;
Pollinger, JP ;
Searle, SMJ ;
Sutter, NB ;
Thomas, R ;
Webber, C ;
Lander, ES .
NATURE, 2005, 438 (7069) :803-819