Parallel short sequence assembly of transcriptomes

被引:22
作者
Jackson, Benjamin G. [1 ]
Schnable, Patrick S. [2 ]
Aluru, Srinivas [1 ]
机构
[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA
[2] Iowa State Univ, Ctr Plant Genom, Ames, IA 50011 USA
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
GENOME; MILLIONS; READS;
D O I
10.1186/1471-2105-10-S1-S14
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The de novo assembly of genomes and transcriptomes from short sequences is a challenging problem. Because of the high coverage needed to assemble short sequences as well as the overhead of modeling the assembly problem as a graph problem, the methods for short sequence assembly are often validated using data from BACs or small sized prokaryotic genomes. Results: We present a parallel method for transcriptome assembly from large short sequence data sets. Our solution uses a rigorous graph theoretic framework and tames the computational and space complexity using parallel computers. First, we construct a distributed bidirected graph that captures overlap information. Next, we compact all chains in this graph to determine long unique contigs using undirected parallel list ranking, a problem for which we present an algorithm. Finally, we process this compacted distributed graph to resolve unique regions that are separated by repeats, exploiting the naturally occurring coverage variations arising from differential expression. Conclusion: We demonstrate the validity of our method using a synthetic high coverage data set generated from the predicted coding regions of Zea mays. We assemble 925 million sequences consisting of 40 billion nucleotides in a few minutes on a 1024 processor Blue Gene/L. Our method is the first fully distributed method for assembling a non-hierarchical short sequence data set and can scale to large problem sizes.
引用
收藏
页数:12
相关论文
共 18 条
  • [1] ALLPATHS: De novo assembly of whole-genome shotgun microreads
    Butler, Jonathan
    MacCallum, Iain
    Kleber, Michael
    Shlyakhter, Ilya A.
    Belmonte, Matthew K.
    Lander, Eric S.
    Nusbaum, Chad
    Jaffe, David B.
    [J]. GENOME RESEARCH, 2008, 18 (05) : 810 - 820
  • [2] DEHNE F, 1996, AS COMP SCI C ASIAN, P1
  • [3] DOHM J, 1997, GENOME RES, V17, P1697
  • [4] A strategy for assembling the maize (Zea mays L.) genome
    Emrich, SJ
    Aluru, S
    Fu, Y
    Wen, TJ
    Narayanan, M
    Guo, L
    Ashlock, DA
    Schnable, PS
    [J]. BIOINFORMATICS, 2004, 20 (02) : 140 - 147
  • [5] Helman David R, 1998, J EXPT ALGORITHMICS, V3, P4
  • [6] De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer
    Hernandez, David
    Francois, Patrice
    Farinelli, Laurent
    Osteras, Magne
    Schrenzel, Jacques
    [J]. GENOME RESEARCH, 2008, 18 (05) : 802 - 809
  • [7] JACKSON BG, 2008, PAR PROC 2008 ICPP 0, P346
  • [8] Karp R.M., 1972, COMPLEXITY COMPUTER, P85
  • [9] Medvedev P, 2008, LECT N BIOINFORMAT, V4955, P50
  • [10] Computability of models for sequence assembly
    Medvedev, Paul
    Georgiou, Konstantinos
    Myers, Gene
    Brudno, Michael
    [J]. ALGORITHMS IN BIOINFORMATICS, PROCEEDINGS, 2007, 4645 : 289 - 301