Viral population estimation using pyrosequencing

被引:160
作者
Eriksson, Nicholas [1 ]
Pachter, Lior [2 ]
Mitsuya, Yumi [3 ]
Rhee, Soo-Yon [3 ]
Wang, Chunlin [3 ]
Gharizadeh, Baback [4 ]
Ronaghi, Mostafa [4 ]
Shafer, Robert W. [3 ]
Beerenwinkel, Niko [5 ]
机构
[1] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
[2] Univ Calif Berkeley, Dept Math, Berkeley, CA 94720 USA
[3] Stanford Univ, Med Ctr, Div Infect Dis, Stanford, CA 94305 USA
[4] Stanford Univ, Genome Technol Ctr, Palo Alto, CA 94304 USA
[5] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, Basel, Switzerland
关键词
D O I
10.1371/journal.pcbi.1000074
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate-based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug-resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an expectation-maximization (EM) algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.
引用
收藏
页数:13
相关论文
共 46 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]   Genotypic correlates of phenotypic resistance to efavirenz in virus isolates from patients failing nonnucleoside reverse transcriptase inhibitor therapy [J].
Bacheler, L ;
Jeffrey, S ;
Hanna, G ;
D'Aquila, R ;
Wallace, L ;
Logue, K ;
Cordova, B ;
Hertogs, K ;
Larder, B ;
Buckery, R ;
Baker, D ;
Gallagher, K ;
Scarnati, H ;
Tritch, R ;
Rizzo, C .
JOURNAL OF VIROLOGY, 2001, 75 (11) :4999-5008
[3]   Computational methods for the design of effective therapies against drug resistant HIV strains [J].
Beerenwinkel, N ;
Sing, T ;
Lengauer, T ;
Rahnenführer, J ;
Roomp, K ;
Savenkov, I ;
Fischer, R ;
Hoffmann, D ;
Selbig, J ;
Korn, K ;
Walter, H ;
Berg, T ;
Braun, P ;
Fätkenheuer, G ;
Oette, M ;
Rockstroh, J ;
Kupfer, B ;
Kaiser, R ;
Däumer, M .
BIOINFORMATICS, 2005, 21 (21) :3943-3950
[4]   Genomic analysis of uncultured marine viral communities [J].
Breitbart, M ;
Salamon, P ;
Andresen, B ;
Mahaffy, JM ;
Segall, AM ;
Mead, D ;
Azam, F ;
Rohwer, F .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (22) :14250-14255
[5]   Bioinformatics for whole-genome shotgun sequencing of microbial communities [J].
Chen, K ;
Pachter, L .
PLOS COMPUTATIONAL BIOLOGY, 2005, 1 (02) :106-112
[6]   A branched DNA signal amplification assay for quantification of nucleic acid targets below 100 molecules/ml [J].
Collins, ML ;
Irvine, B ;
Tyner, D ;
Fine, E ;
Zayati, C ;
Chang, CA ;
Horn, T ;
Ahle, D ;
Detmer, J ;
Shen, LP ;
Kolberg, J ;
Bushnell, S ;
Urdea, MS ;
Ho, DD .
NUCLEIC ACIDS RESEARCH, 1997, 25 (15) :2979-2984
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]   A DECOMPOSITION THEOREM FOR PARTIALLY ORDERED SETS [J].
DILWORTH, RP .
ANNALS OF MATHEMATICS, 1950, 51 (01) :161-166
[9]   RNA virus mutations and fitness for survival [J].
Domingo, E ;
Holland, JJ .
ANNUAL REVIEW OF MICROBIOLOGY, 1997, 51 :151-178
[10]   The rational design of an AIDS vaccine [J].
Douek, DC ;
Kwong, PD ;
Nabell-, GJ .
CELL, 2006, 124 (04) :677-681