Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes

被引:200
作者
Hormozdiari, Fereydoun [2 ]
Alkan, Can [1 ,3 ]
Eichler, Evan E. [1 ,3 ]
Sahinalp, S. Cenk [2 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[3] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
基金
加拿大自然科学与工程研究理事会;
关键词
COPY-NUMBER; GENETIC-VARIATION; TECHNOLOGIES; MAP;
D O I
10.1101/gr.088633.108
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural variants among human individuals are common. The Human Genome Structural Variation Project aims to identify and classify deletions, insertions, and inversions (> 5 Kbp) in a small number of normal individuals with a fosmid-based paired-end sequencing approach using traditional sequencing technologies. The realization of new ultra-high-throughput sequencing platforms now makes it feasible to detect the full spectrum of genomic variation among many individual genomes, including cancer patients and others suffering from diseases of genomic origin. Unfortunately, existing algorithms for identifying structural variation (SV) among individuals have not been designed to handle the short read lengths and the errors implied by the "next-gen'' sequencing (NGS) technologies. In this paper, we give combinatorial formulations for the SV detection between a reference genome sequence and a next-gen-based, paired-end, whole genome shotgun-sequenced individual. We describe efficient algorithms for each of the formulations we give, which all turn out to be fast and quite reliable; they are also applicable to all next-gen sequencing methods (Illumina, 454 Life Sciences [Roche], ABI SOLiD, etc.) and traditional capillary sequencing technology. We apply our algorithms to identify SV among individual genomes very recently sequenced by Illumina technology.
引用
收藏
页码:1270 / 1278
页数:9
相关论文
共 33 条
  • [1] A haplotype map of the human genome
    Altshuler, D
    Brooks, LD
    Chakravarti, A
    Collins, FS
    Daly, MJ
    Donnelly, P
    Gibbs, RA
    Belmont, JW
    Boudreau, A
    Leal, SM
    Hardenbol, P
    Pasternak, S
    Wheeler, DA
    Willis, TD
    Yu, FL
    Yang, HM
    Zeng, CQ
    Gao, Y
    Hu, HR
    Hu, WT
    Li, CH
    Lin, W
    Liu, SQ
    Pan, H
    Tang, XL
    Wang, J
    Wang, W
    Yu, J
    Zhang, B
    Zhang, QR
    Zhao, HB
    Zhao, H
    Zhou, J
    Gabriel, SB
    Barry, R
    Blumenstiel, B
    Camargo, A
    Defelice, M
    Faggart, M
    Goyette, M
    Gupta, S
    Moore, J
    Nguyen, H
    Onofrio, RC
    Parkin, M
    Roy, J
    Stahl, E
    Winchester, E
    Ziaugra, L
    Shen, Y
    [J]. NATURE, 2005, 437 (7063) : 1299 - 1320
  • [2] [Anonymous], 2001, Approximation algorithms
  • [3] Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer
    Bashir, Ali
    Volik, Stanislav
    Collins, Colin
    Bafna, Vineet
    Raphael, Benjamin J.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (04)
  • [4] Genetic variation of recent Alu insertions in human populations
    Batzer, MA
    Arcot, SS
    Phinney, JW
    AlegriaHartman, M
    Kass, DH
    Milligan, SM
    Kimpton, C
    Gill, P
    Hochmeister, M
    Ioannou, PA
    Herrera, RJ
    Boudreau, DA
    Scheer, WD
    Keats, BJB
    Deininger, PL
    Stoneking, M
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1996, 42 (01) : 22 - 29
  • [5] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [6] L1 (LINE-1) retrotransposon evolution and amplification in recent human history
    Boissinot, S
    Chevret, P
    Furano, AV
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (06) : 915 - 928
  • [7] Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing
    Campbell, Peter J.
    Stephens, Philip J.
    Pleasance, Erin D.
    O'Meara, Sarah
    Li, Heng
    Santarius, Thomas
    Stebbings, Lucy A.
    Leroy, Catherine
    Edkins, Sarah
    Hardy, Claire
    Teague, Jon W.
    Menzies, Andrew
    Goodhead, Ian
    Turner, Daniel J.
    Clee, Christopher M.
    Quail, Michael A.
    Cox, Antony
    Brown, Clive
    Durbin, Richard
    Hurles, Matthew E.
    Edwards, Paul A. W.
    Bignell, Graham R.
    Stratton, Michael R.
    Futreal, P. Andrew
    [J]. NATURE GENETICS, 2008, 40 (06) : 722 - 729
  • [8] Mutational and selective effects on copy-number variants in the human genome
    Cooper, Gregory M.
    Nickerson, Deborah A.
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2007, 39 (Suppl 7) : S22 - S29
  • [9] A tool for analyzing mate pairs in assemblies (TAMPA)
    Dew, IM
    Walenz, B
    Sutton, G
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2005, 12 (05) : 497 - 513
  • [10] Completing the map of human genetic variation
    Eichler, Evan E.
    Nickerson, Deborah A.
    Altshuler, David
    Bowcock, Anne M.
    Brooks, Lisa D.
    Carter, Nigel P.
    Church, Deanna M.
    Felsenfeld, Adam
    Guyer, Mark
    Lee, Charles
    Lupski, James R.
    Mullikin, James C.
    Pritchard, Jonathan K.
    Sebat, Jonathan
    Sherry, Stephen T.
    Smith, Douglas
    Valle, David
    Waterston, Robert H.
    [J]. NATURE, 2007, 447 (7141) : 161 - 165