HAL: a hierarchical format for storing and analyzing multiple genome alignments

被引:108
作者
Hickey, Glenn [1 ]
Paten, Benedict [1 ]
Earl, Dent [1 ]
Zerbino, Daniel [1 ]
Haussler, David [1 ,2 ]
机构
[1] Univ Calif Santa Cruz, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
[2] Univ Calif Santa Cruz, Howard Hughes Med Inst, Santa Cruz, CA 95064 USA
关键词
SEQUENCE;
D O I
10.1093/bioinformatics/btt128
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Large multiple genome alignments and inferred ancestral genomes are ideal resources for comparative studies of molecular evolution, and advances in sequencing and computing technology are making them increasingly obtainable. These structures can provide a rich understanding of the genetic relationships between all subsets of species they contain. Current formats for storing genomic alignments, such as XMFA and MAF, are all indexed or ordered using a single reference genome, however, which limits the information that can be queried with respect to other species and clades. This loss of information grows with the number of species under comparison, as well as their phylogenetic distance. Results: We present HAL, a compressed, graph-based hierarchical alignment format for storing multiple genome alignments and ancestral reconstructions. HAL graphs are indexed on all genomes they contain. Furthermore, they are organized phylogenetically, which allows for modular and parallel access to arbitrary subclades without fragmentation because of rearrangements that have occurred in other lineages. HAL graphs can be created or read with a comprehensive C++ API. A set of tools is also provided to perform basic operations, such as importing and exporting data, identifying mutations and coordinate mapping (liftover).
引用
收藏
页码:1341 / 1342
页数:2
相关论文
共 8 条
  • [1] Aligning multiple genomic sequences with the threaded blockset aligner
    Blanchette, M
    Kent, WJ
    Riemer, C
    Elnitski, L
    Smit, AFA
    Roskin, KM
    Baertsch, R
    Rosenbloom, K
    Clawson, H
    Green, ED
    Haussler, D
    Miller, W
    [J]. GENOME RESEARCH, 2004, 14 (04) : 708 - 715
  • [2] Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species
    Haussler, David
    O'Brien, Stephen J.
    Ryder, Oliver A.
    Barker, F. Keith
    Clamp, Michele
    Crawford, Andrew J.
    Hanner, Robert
    Hanotte, Olivier
    Johnson, Warren E.
    McGuire, Jimmy A.
    Miller, Webb
    Murphy, Robert W.
    Murphy, William J.
    Sheldon, Frederick H.
    Sinervo, Barry
    Venkatesh, Byrappa
    Wiley, Edward O.
    Allendorf, Fred W.
    Amato, George
    Baker, C. Scott
    Bauer, Aaron
    Beja-Pereira, Albano
    Bermingham, Eldredge
    Bernardi, Giacomo
    Bonvicino, Cibele R.
    Brenner, Sydney
    Burke, Terry
    Cracraft, Joel
    Diekhans, Mark
    Edwards, Scott
    Ericson, Per G. P.
    Estes, James
    Fjelsda, Jon
    Flesness, Nate
    Gamble, Tony
    Gaubert, Philippe
    Graphodatsky, Alexander S.
    Graves, Jennifer A. Marshall
    Green, Eric D.
    Green, Richard E.
    Hackett, Shannon
    Hebert, Paul
    Helgen, Kristofer M.
    Joseph, Leo
    Kessing, Bailey
    Kingsley, David M.
    Lewin, Harris A.
    Luikart, Gordon
    Martelli, Paolo
    Moreira, Miguel A. M.
    [J]. JOURNAL OF HEREDITY, 2009, 100 (06) : 659 - 674
  • [3] BEDOPS: high-performance genomic feature operations
    Neph, Shane
    Kuehn, M. Scott
    Reynolds, Alex P.
    Haugen, Eric
    Thurman, Robert E.
    Johnson, Audra K.
    Rynes, Eric
    Maurano, Matthew T.
    Vierstra, Jeff
    Thomas, Sean
    Sandstrom, Richard
    Humbert, Richard
    Stamatoyannopoulos, John A.
    [J]. BIOINFORMATICS, 2012, 28 (14) : 1919 - 1920
  • [4] Recent evolutions of multiple sequence alignment algorithms
    Notredame, Cedric
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (08) : 1405 - 1408
  • [5] Cactus: Algorithms for genome multiple sequence alignment
    Paten, Benedict
    Earl, Dent
    Ngan Nguyen
    Diekhans, Mark
    Zerbino, Daniel
    Haussler, David
    [J]. GENOME RESEARCH, 2011, 21 (09) : 1512 - 1528
  • [6] Genome Rearrangements in mammalian evolution: Lessons from human and mouse genomes
    Pevzner, P
    Tesler, G
    [J]. GENOME RESEARCH, 2003, 13 (01) : 37 - 45
  • [7] BEDTools: a flexible suite of utilities for comparing genomic features
    Quinlan, Aaron R.
    Hall, Ira M.
    [J]. BIOINFORMATICS, 2010, 26 (06) : 841 - 842
  • [8] The HDF5 Group, 2000, HIER DAT FORM VERS