De novo assembly and genotyping of variants using colored de Bruijn graphs

被引:406
作者
Iqbal, Zamin [1 ,2 ]
Caccamo, Mario [3 ]
Turner, Isaac [1 ]
Flicek, Paul [2 ]
McVean, Gil [1 ,4 ]
机构
[1] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford, England
[2] European Bioinformat Inst, Hinxton, England
[3] Genome Anal Ctr, Norwich, Norfolk, England
[4] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
基金
英国生物技术与生命科学研究理事会; 英国惠康基金;
关键词
DIPLOID GENOME SEQUENCE; COPY-NUMBER VARIATION; HIGH-RESOLUTION HLA; STRUCTURAL VARIATION; POPULATION-SCALE; POLYMORPHIC GENOMES; CIONA-SAVIGNYI; HAPLOTYPE MAP; STRING GRAPH; GENE;
D O I
10.1038/ng.1028
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Detecting genetic variants that are highly divergent from a reference sequence remains a major challenge in genome sequencing. We introduce de novo assembly algorithms using colored de Bruijn graphs for detecting and genotyping simple and complex genetic variants in an individual or population. We provide an efficient software implementation, Cortex, the first de novo assembler capable of assembling multiple eukaryotic genomes simultaneously. Four applications of Cortex are presented. First, we detect and validate both simple and complex structural variations in a high-coverage human genome. Second, we identify more than 3 Mb of sequence absent from the human reference genome, in pooled low-coverage population sequence data from the 1000 Genomes Project. Third, we show how population information from ten chimpanzees enables accurate variant calls without a reference sequence. Last, we estimate classical human leukocyte antigen (HLA) genotypes at HLA-B, the most variable gene in the human genome.
引用
收藏
页码:226 / 232
页数:7
相关论文
共 52 条
[1]   Dindel: Accurate indel calls from short-read data [J].
Albers, Cornelis A. ;
Lunter, Gerton ;
MacArthur, Daniel G. ;
McVean, Gilean ;
Ouwehand, Willem H. ;
Durbin, Richard .
GENOME RESEARCH, 2011, 21 (06) :961-973
[2]   SEQUENCE-ANALYSIS OF HLA-BW53, A COMMON WEST AFRICAN ALLELE, SUGGESTS AN ORIGIN BY GENE CONVERSION OF HLA-B35 [J].
ALLSOPP, CEM ;
HILL, AVS ;
KWIATKOWSKI, D ;
HUGHES, A ;
BUNCE, M ;
TAYLOR, CJ ;
PAZMANY, L ;
BREWSTER, D ;
MCMICHAEL, AJ ;
GREENWOOD, BM .
HUMAN IMMUNOLOGY, 1991, 30 (02) :105-109
[3]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[4]   Highways of gene sharing in prokaryotes [J].
Beiko, RG ;
Harlow, TJ ;
Ragan, MA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (40) :14332-14337
[5]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[6]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[7]   De novo fragment assembly with short mate-paired reads: Does the read length matter? [J].
Chaisson, Mark J. ;
Brinza, Dumitru ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (02) :336-346
[8]   A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC [J].
de Bakker, Paul I. W. ;
McVean, Gil ;
Sabeti, Pardis C. ;
Miretti, Marcos M. ;
Green, Todd ;
Marchini, Jonathan ;
Ke, Xiayi ;
Monsuur, Alienke J. ;
Whittaker, Pamela ;
Delgado, Marcos ;
Morrison, Jonathan ;
Richardson, Angela ;
Walsh, Emily C. ;
Gao, Xiaojiang ;
Galver, Luana ;
Hart, John ;
Hafler, David A. ;
Pericak-Vance, Margaret ;
Todd, John A. ;
Daly, Mark J. ;
Trowsdale, John ;
Wijmenga, Cisca ;
Vyse, Tim J. ;
Beck, Stephan ;
Murray, Sarah Shaw ;
Carrington, Mary ;
Gregory, Simon ;
Deloukas, Panos ;
Rioux, John D. .
NATURE GENETICS, 2006, 38 (10) :1166-1172
[9]   Analysis of next-generation genomic data in cancer: accomplishments and challenges [J].
Ding, Li ;
Wendl, Michael C. ;
Koboldt, Daniel C. ;
Mardis, Elaine R. .
HUMAN MOLECULAR GENETICS, 2010, 19 :R188-R196
[10]  
Donmez N, 2011, LECT N BIOINFORMAT, V6577, P38, DOI 10.1007/978-3-642-20036-6_5