A parallel graph decomposition algorithm for DNA sequencing with nanopores

被引:10
作者
Bokhari, SH [1 ]
Sauer, JR
机构
[1] UET, Dept Elect Engn, Lahore 54890, Pakistan
[2] Eagle Res & Dev, Boulder, CO 80301 USA
关键词
D O I
10.1093/bioinformatics/bti129
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: With the potential availability of nanopore devices that can sense the bases of translocating single-stranded DNA (ssDNA), it is likely that 'reads' of length similar to 10(5) will be available in large numbers and at high speed. We address the problem of complete DNA sequencing using such reads. We assume that similar to 10(2) copies of a DNA sequence are split into single strands that break into randomly sized pieces as they translocate the nanopore in arbitrary orientations. The nanopore senses and reports each individual base that passes through, but all information about orientation and complementarity of the ssDNA subsequences is lost. Random errors (both biological and transduction) in the reads create further complications. Results: We have developed an algorithm that addresses these issues. It can be considered an extreme variation of the well-known Eulerian path approach. It searches over a space of de Bruijn graphs until it finds one in which (a) the impact of errors is eliminated and (b) both possible orientations of the two ssDNA sequences can be identified separately and unambiguously. Our algorithm is able to correctly reconstruct real DNA sequences of the order of 10(6) bases (e.g. the bacterium Mycoplasma pneumoniae) from simulated erroneous reads on a modest workstation in about 1 h. We describe, and give measured timings of, a parallel implementation of this algorithm on the Cray Multithreaded Architecture (MTA-2) supercomputer, whose architecture is ideally suited to this 'unstructured' problem. Our parallel implementation is crucial to the problem of rapidly sequencing long DNA sequences and also to the situation where multiple nanopores are used to obtain a high-bandwidth stream of reads.
引用
收藏
页码:889 / 896
页数:8
相关论文
共 19 条
[1]   Tera computer system [J].
Alverson, Robert ;
Callahan, David ;
Cummings, Daniel ;
Koblenz, Brian ;
Porterfield, Allan ;
Smith, Burton .
Conference Proceedings - International Conference on Supercomputing, 1990,
[2]  
[Anonymous], P 6 INT C SUP WASH D, DOI DOI 10.1145/143369.143408
[3]  
[Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
[4]   Recent segmental duplications in the human genome [J].
Bailey, JA ;
Gu, ZP ;
Clark, RA ;
Reinert, K ;
Samonte, RV ;
Schwartz, S ;
Adams, MD ;
Myers, EW ;
Li, PW ;
Eichler, EE .
SCIENCE, 2002, 297 (5583) :1003-1007
[5]   Sequence alignment on the Cray MTA-2 [J].
Bokhari, SH ;
Sauer, JR .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2004, 16 (09) :823-839
[6]   Parallelizing a DNA simulation code for the Cray MTA-2 [J].
Bokhari, SH ;
Glaser, MA ;
Jordan, HF .
CSB2002: IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE, 2002, :291-302
[7]  
Branton D, 2002, NATO SCI S PRT 3 HI, V87, P177
[8]   Sequence-specific detection of individual DNA strands using engineered nanopores [J].
Howorka, S ;
Cheley, S ;
Bayley, H .
NATURE BIOTECHNOLOGY, 2001, 19 (07) :636-639
[9]  
Idury R M, 1995, J Comput Biol, V2, P291, DOI 10.1089/cmb.1995.2.291
[10]   Characterization of individual polynucleotide molecules using a membrane channel [J].
Kasianowicz, JJ ;
Brandin, E ;
Branton, D ;
Deamer, DW .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (24) :13770-13773