Industrial applications of high-performance computing for phylogeny reconstruction

被引:23
作者
Bader, DA [1 ]
Moret, BME [1 ]
Vawter, L [1 ]
机构
[1] Univ New Mexico, Dept Elect & Comp Engn, Albuquerque, NM 87131 USA
来源
COMMERCIAL APPLICATIONS FOR HIGH-PERFORMANCE COMPUTING | 2001年 / 4528卷
关键词
high-performance computing; computational genomics; phylogeny reconstruction; breakpoint analysis; gene rearrangement; drug discovery;
D O I
10.1117/12.434868
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phylogenies (that is, tree-of-life relationships) derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Real-world interest is strong in determining these relationships. For example, pharmaceutical companies may use phylogeny reconstruction in drug discovery for discovering synthetic pathways unique to organisms that they wish to target. Health organizations study the phylogenies of organisms such as HIV in order to understand their epidemiologies and to aid in predicting the behaviors of future outbreaks. And governments are interested in aiding the production of such foodstuffs as rice, wheat and potatoes via genetics through understanding of the phylogenetic distribution of genetic variation in wild populations. Yet few techniques are available for difficult phylogenetic reconstruction problems. Appropriate tools for analysis of such data may aid in resolving some of the phylogenetic problems that have been analyzed without much resolution for decades. With the rapid accumulation of whole genome sequences for a wide diversity of taxa, especially microbial taxa, phylogenetic reconstruction based on changes in gene order and gene content is showing promise, particularly for resolving deep (i.e., ancient) branch splits. However, reconstruction from gene-order data is even more computationally expensive than reconstruction from sequence data, particularly in groups with large numbers of genes and highly-rearranged genomes. We have developed a software suite, GRAPPA, that extends the breakpoint analysis (BPAnalysis) method of Sankoff and Blanchette while running much faster: in a recent analysis of chloroplast genome data for species of Campanulaceae on a 512-processor Linux supercluster with Myrinet, we achieved a one-million-fold speedup over BPAnalysis. GRAPPA can use either breakpoint or inversion distance (computed exactly) for its computation and runs on single-processor machines as well as parallel and high-performance computers.
引用
收藏
页码:159 / 168
页数:10
相关论文
共 36 条
  • [1] [Anonymous], P 40 ANN S FDN COMP
  • [2] ARGE L, 2000, P 4 WORKSH ALG ENG W
  • [3] BADER D, 2001, P 7 INT WORKSH ALG D
  • [4] BADER DA, 2000, HPCWIRE, V9
  • [5] Cache-oblivious B-trees
    Bender, MA
    Demaine, ED
    Farach-Colton, M
    [J]. 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 399 - 409
  • [6] Blanchette, 1997, Genome Inform Ser Workshop Genome Inform, V8, P25
  • [7] Antibiotic discovery: is it all in the genes?
    Brown, JR
    Warren, PV
    [J]. DRUG DISCOVERY TODAY, 1998, 3 (12) : 564 - 566
  • [8] A G protein-coupled receptor for UDP-glucose
    Chambers, JK
    Macdonald, LE
    Sarau, HM
    Ames, RS
    Freeman, K
    Foley, JJ
    Zhu, Y
    McLaughlin, MM
    Murdock, P
    McMillan, L
    Trill, J
    Swift, A
    Aiyar, N
    Taylor, P
    Vawter, L
    Naheed, S
    Szekeres, P
    Hervieu, G
    Scott, C
    Watson, JM
    Murphy, AJ
    Duzic, E
    Klein, C
    Bergsma, DJ
    Wilson, S
    Livi, GP
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 2000, 275 (15) : 10767 - 10771
  • [9] Cosner M E, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P104
  • [10] Cosner M. E., 2000, COMP GENOMICS, P99