Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants

被引:11
作者
Du, Jiang [1 ]
Bjornson, Robert D. [1 ,2 ]
Zhang, Zhengdong D. [3 ]
Kong, Yong [2 ]
Snyder, Michael [3 ,4 ]
Gerstein, Mark B. [1 ,3 ,5 ]
机构
[1] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA
[2] Yale Univ, Keck Biotechnol Resource Lab, New Haven, CT 06520 USA
[3] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[4] Yale Univ, Dept Mol Cellular & Dev Biol, New Haven, CT 06520 USA
[5] Yale Univ, Program Computat Biol & Bioinformat, New Haven, CT 06520 USA
关键词
COPY-NUMBER; BREAKPOINTS; ALGORITHM;
D O I
10.1371/journal.pcbi.1000432
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.
引用
收藏
页数:15
相关论文
共 30 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]   An MCMC algorithm for haplotype assembly from whole-genome sequence data [J].
Bansal, Vikas ;
Halpern, Aaron L. ;
Axelrod, Nelson ;
Bafna, Vineet .
GENOME RESEARCH, 2008, 18 (08) :1336-1346
[3]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[4]   Whole-genome re-sequencing [J].
Bentley, David R. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) :545-552
[5]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[6]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[7]   Finishing the euchromatic sequence of the human genome [J].
Collins, FS ;
Lander, ES ;
Rogers, J ;
Waterston, RH .
NATURE, 2004, 431 (7011) :931-945
[8]   SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
GENOME RESEARCH, 2007, 17 (11) :1697-1706
[9]   A Sanger/pyrosequencing hybrid approach tor the generation of high-quality draft assemblies of marine microbial genomes [J].
Goldberg, Susanne M. D. ;
Johnson, Justin ;
Busam, Dana ;
Feldblyum, Tamara ;
Ferriera, Steve ;
Friedman, Robert ;
Halpern, Aaron ;
Khouri, Hoda ;
Kravitz, Saul A. ;
Lauro, Federico M. ;
Li, Kelvin ;
Rogers, Yu-Hui ;
Strausberg, Robert ;
Sutton, Granger ;
Tallon, Luke ;
Thomas, Torsten ;
Venter, Eli ;
Frazier, Marvin ;
Venter, J. Craig .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (30) :11240-11245
[10]   Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: A platform comparison based on statistical power analysis [J].
Hehir-Kwa, Jayne Y. ;
Egmont-Petersen, Michael ;
Janssen, Irene M. ;
Smeets, Dominique ;
Van Kessel, Ad Geurts ;
Veltman, Joris A. .
DNA RESEARCH, 2007, 14 (01) :1-11