Estimating abundances of retroviral insertion sites from DNA fragment length data

被引:104
作者
Berry, Charles C. [1 ]
Gillet, Nicolas A. [2 ]
Melamed, Anat [3 ]
Gormley, Niall [4 ]
Bangham, Charles R. M. [3 ]
Bushman, Frederic D. [5 ]
机构
[1] Univ Calif San Diego, Dept Family & Prevent Med, Div Biostat & BioInformat, La Jolla, CA 92093 USA
[2] Univ Liege, Dept Mol & Cellular Epigenet, Liege, Belgium
[3] Univ London Imperial Coll Sci Technol & Med, Wright Fleming Inst, Dept Immunol, London W2 1PG, England
[4] Illumina, Saffron Walden CB10 1XL, Essex, England
[5] Univ Penn, Sch Med, Dept Microbiol, Philadelphia, PA 19104 USA
基金
英国惠康基金;
关键词
GENE-THERAPY; INTEGRATION;
D O I
10.1093/bioinformatics/bts004
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The relative abundance of retroviral insertions in a host genome is important in understanding the persistence and pathogenesis of both natural retroviral infections and retroviral gene therapy vectors. It could be estimated from a sample of cells if only the host genomic sites of retroviral insertions could be directly counted. When host genomic DNA is randomly broken via sonication and then amplified, amplicons of varying lengths are produced. The number of unique lengths of amplicons of an insertion site tends to increase according to its abundance, providing a basis for estimating relative abundance. However, as abundance increases amplicons of the same length arise by chance leading to a nonlinear relation between the number of unique lengths and relative abundance. The difficulty in calibrating this relation is compounded by sample-specific variations in the relative frequencies of clones of each length. Results: A likelihood function is proposed for the discrete lengths observed in each of a collection of insertion sites and is maximized with a hybrid expectation-maximization algorithm. Patient data illustrate the method and simulations show that relative abundance can be estimated with little bias, but that variation in highly abundant sites can be large. In replicated patient samples, variation exceeds what the model implies-requiring adjustment as in Efron (2004) or using jackknife standard errors. Consequently, it is advantageous to collect replicate samples to strengthen inferences about relative abundance.
引用
收藏
页码:755 / 762
页数:8
相关论文
共 28 条
[1]   Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[2]  
[Anonymous], 2001, PRACTICAL GUIDE SPLI
[3]   THE MULTINOMIAL-POISSON TRANSFORMATION [J].
BAKER, SG .
STATISTICIAN, 1994, 43 (04) :495-504
[4]   A method to sequence and quantify DNA integration for monitoring outcome in gene therapy [J].
Brady, Troy ;
Roth, Shoshannah L. ;
Malani, Nirav ;
Wang, Gary P. ;
Berry, Charles C. ;
Leboulch, Philippe ;
Hacein-Bey-Abina, Salima ;
Cavazzana-Calvo, Marina ;
Papapetrou, Eirini P. ;
Sadelain, Michel ;
Savilahti, Harri ;
Bushman, Frederic D. .
NUCLEIC ACIDS RESEARCH, 2011, 39 (11) :e72
[5]   Transfusion independence and HMGA2 activation after gene therapy of human β-thalassaemia [J].
Cavazzana-Calvo, Marina ;
Payen, Emmanuel ;
Negre, Olivier ;
Wang, Gary ;
Hehir, Kathleen ;
Fusil, Floriane ;
Down, Julian ;
Denaro, Maria ;
Brady, Troy ;
Westerman, Karen ;
Cavallesco, Resy ;
Gillet-Legrand, Beatrix ;
Caccavelli, Laure ;
Sgarra, Riccardo ;
Maouche-Chretien, Leila ;
Bernaudin, Francoise ;
Girot, Robert ;
Dorazio, Ronald ;
Mulder, Geert-Jan ;
Polack, Axel ;
Bank, Arthur ;
Soulier, Jean ;
Larghero, Jerome ;
Kabbara, Nabil ;
Dalle, Bruno ;
Gourmel, Bernard ;
Socie, Gerard ;
Chretien, Stany ;
Cartier, Nathalie ;
Aubourg, Patrick ;
Fischer, Alain ;
Cornetta, Kenneth ;
Galacteros, Frederic ;
Beuzard, Yves ;
Gluckman, Eliane ;
Bushman, Frederick ;
Hacein-Bey-Abina, Salima ;
Leboulch, Philippe .
NATURE, 2010, 467 (7313) :318-U94
[6]   Nonparametric estimation of Shannon's index of diversity when there are unseen species in sample [J].
Chao, A ;
Shen, TJ .
ENVIRONMENTAL AND ECOLOGICAL STATISTICS, 2003, 10 (04) :429-443
[7]   ESTIMATING THE POPULATION-SIZE FOR CAPTURE RECAPTURE DATA WITH UNEQUAL CATCHABILITY [J].
CHAO, A .
BIOMETRICS, 1987, 43 (04) :783-791
[8]   ESTIMATING THE NUMBER OF CLASSES VIA SAMPLE COVERAGE [J].
CHAO, A ;
LEE, SM .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (417) :210-217
[9]   Vector integration is nonrandom and clustered and influences the fate of lymphopoiesis in SCID-X1 gene therapy [J].
Deichmann, Annette ;
Hacein-Bey-Abina, Salima ;
Schmidt, Manfred ;
Garrigue, Alexandrine ;
Brugman, Martijn H. ;
Hu, Jingqiong ;
Glimm, Hanno ;
Gyapay, Gabor ;
Prum, Bernard ;
Fraser, Christopher C. ;
Fischer, Nicolas ;
Schwarzwaelder, Kerstin ;
Siegler, Maria-Luise ;
de Ridder, Dick ;
Pike-Overzet, Karin ;
Howe, Steven J. ;
Thrasher, Adrian J. ;
Wagemaker, Gerard ;
Abel, Ulrich ;
Staal, Frank J. T. ;
Delabesse, Eric ;
Villeval, Jean-Luc ;
Aronow, Bruce ;
Hue, Christophe ;
Prinz, Claudia ;
Wissler, Manuela ;
Klanke, Chuck ;
Weissenbach, Jean ;
Alexander, Ian ;
Fischer, Alain ;
von Kalle, Christof ;
Cavazzana-Calvo, Marina .
JOURNAL OF CLINICAL INVESTIGATION, 2007, 117 (08) :2225-2232
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38