De novo assembly of highly diverse viral populations

被引:149
作者
Yang, Xiao [1 ]
Charlebois, Patrick [1 ]
Gnerre, Sante [1 ]
Coole, Matthew G. [1 ]
Lennon, Niall J. [1 ]
Levin, Joshua Z. [1 ]
Qu, James [1 ]
Ryan, Elizabeth M. [1 ]
Zody, Michael C. [1 ]
Henn, Matthew R. [1 ]
机构
[1] Broad Inst MIT & Harvard, Cambridge, MA 02142 USA
来源
BMC GENOMICS | 2012年 / 13卷
基金
美国国家卫生研究院;
关键词
NATURALLY INFECTED MOSQUITOS; VIRUS; GENOMES; ALGORITHMS; BIRDS;
D O I
10.1186/1471-2164-13-475
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Extensive genetic diversity in viral populations within infected hosts and the divergence of variants from existing reference genomes impede the analysis of deep viral sequencing data. A de novo population consensus assembly is valuable both as a single linear representation of the population and as a backbone on which intra-host variants can be accurately mapped. The availability of consensus assemblies and robustly mapped variants are crucial to the genetic study of viral disease progression, transmission dynamics, and viral evolution. Existing de novo assembly techniques fail to robustly assemble ultra-deep sequence data from genetically heterogeneous populations such as viruses into full-length genomes due to the presence of extensive genetic variability, contaminants, and variable sequence coverage. Results: We present VICUNA, a de novo assembly algorithm suitable for generating consensus assemblies from genetically heterogeneous populations. We demonstrate its effectiveness on Dengue, Human Immunodeficiency and West Nile viral populations, representing a range of intra-host diversity. Compared to state-of-the-art assemblers designed for haploid or diploid systems, VICUNA recovers full-length consensus and captures insertion/deletion polymorphisms in diverse samples. Final assemblies maintain a high base calling accuracy. VICUNA program is publicly available at: http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/viral-genomics-analysis-software. Conclusions: We developed VICUNA, a publicly available software tool, that enables consensus assembly of ultra-deep sequence derived from diverse viral populations. While VICUNA was developed for the analysis of viral populations, its application to other heterogeneous sequence data sets such as metagenomic or tumor cell population samples may prove beneficial in these fields of research.
引用
收藏
页数:13
相关论文
共 40 条
[11]   De novo assembly and genotyping of variants using colored de Bruijn graphs [J].
Iqbal, Zamin ;
Caccamo, Mario ;
Turner, Isaac ;
Flicek, Paul ;
McVean, Gil .
NATURE GENETICS, 2012, 44 (02) :226-232
[12]   Genetic variation in West Nile virus from naturally infected mosquitoes and birds suggests quasispecies structure and strong purifying selection [J].
Jerzak, G ;
Bernard, KA ;
Kramer, LD ;
Ebel, GD .
JOURNAL OF GENERAL VIROLOGY, 2005, 86 :2175-2183
[13]   Assembling genomes on large-scale parallel computers [J].
Kalyanaraman, A. ;
Emrich, S. J. ;
Schnable, P. S. ;
Aluru, S. .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2007, 67 (12) :1240-1255
[14]   VarScan: variant detection in massively parallel sequencing of individual and pooled samples [J].
Koboldt, Daniel C. ;
Chen, Ken ;
Wylie, Todd ;
Larson, David E. ;
McLellan, Michael D. ;
Mardis, Elaine R. ;
Weinstock, George M. ;
Wilson, Richard K. ;
Ding, Li .
BIOINFORMATICS, 2009, 25 (17) :2283-2285
[15]   Analysis of Hepatitis C Virus Intrahost Diversity across the Coding Region by Ultradeep Pyrosequencing [J].
Lauck, Michael ;
Alvarado-Mora, Monica V. ;
Becker, Ericka A. ;
Bhattacharya, Dipankar ;
Striker, Rob ;
Hughes, Austin L. ;
Carrilho, Flair J. ;
O'Connor, David H. ;
Rebello Pinho, Joao R. .
JOURNAL OF VIROLOGY, 2012, 86 (07) :3952-3960
[16]  
Levin JZ, 2010, NAT METHODS, V7, P709, DOI [10.1038/nmeth.1491, 10.1038/NMETH.1491]
[17]   A survey of sequence alignment algorithms for next-generation sequencing [J].
Li, Heng ;
Homer, Nils .
BRIEFINGS IN BIOINFORMATICS, 2010, 11 (05) :473-483
[18]   Superiority and Complexity of the Spaced Seeds [J].
Li, Ming ;
Ma, Bin ;
Zhang, Louxin .
PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, :444-+
[19]   Study of sequence variation of dengue type 3 virus in naturally infected mosquitoes and human hosts: Implications for transmission and evolution [J].
Lin, SR ;
Hsieh, SC ;
Yueh, YY ;
Lin, TH ;
Chao, DY ;
Chen, WJ ;
King, CC ;
Wang, WK .
JOURNAL OF VIROLOGY, 2004, 78 (22) :12717-12721
[20]   Highly Sensitive and Specific Detection of Rare Variants in Mixed Viral Populations from Massively Parallel Sequence Data [J].
Macalalad, Alexander R. ;
Zody, Michael C. ;
Charlebois, Patrick ;
Lennon, Niall J. ;
Newman, Ruchi M. ;
Malboeuf, Christine M. ;
Ryan, Elizabeth M. ;
Boutwell, Christian L. ;
Power, Karen A. ;
Brackney, Doug E. ;
Pesko, Kendra N. ;
Levin, Joshua Z. ;
Ebel, Gregory D. ;
Allen, Todd M. ;
Birren, Bruce W. ;
Henn, Matthew R. .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (03)