Design of Association Studies with Pooled or Un-pooled Next-Generation Sequencing Data

被引:68
作者
Kim, Su Yeon [1 ,2 ]
Li, Yingrui [3 ]
Guo, Yiran [3 ]
Li, Ruiqiang [3 ]
Holmkvist, Johan [4 ]
Hansen, Torben [4 ,5 ]
Pedersen, Oluf [4 ,6 ,7 ]
Wang, Jun [3 ,8 ]
Nielsen, Rasmus [1 ,2 ,3 ,8 ]
机构
[1] Univ Calif Berkeley, Dept Integrat Biol, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[3] Beijing Genom Inst, Shenzhen, Peoples R China
[4] Hagedorn Res Inst, Gentofte, Denmark
[5] Univ So Denmark, Fac Hlth Sci, Odense, Denmark
[6] Univ Aarhus, Fac Hlth Sci, Aarhus, Denmark
[7] Univ Copenhagen, Inst Biomed Sci, Copenhagen, Denmark
[8] Univ Copenhagen, Dept Biol, Copenhagen, Denmark
关键词
pooled samples; association mapping; rare allele; optimal design; next-generation sequencing; GENOME-WIDE ASSOCIATION; ALLELE FREQUENCY ESTIMATION; LARGE-SCALE ASSOCIATION; MULTIPLE RARE ALLELES; COMPLEX TRAITS; COMMON DISEASES; DNA; VARIANTS; SUSCEPTIBILITY; DISCOVERY;
D O I
10.1002/gepi.20501
中图分类号
Q3 [遗传学];
学科分类号
071007 [遗传学];
摘要
Most common hereditary diseases in humans are complex and multifactorial. Large-scale genome-wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next-generation sequencing, which would allow the analysis of rare variants, is now becoming so cheap that it provides a viable alternative to SNP genotyping. In this paper, we present cost-effective protocols for using next-generation sequencing in association mapping studies based on pooled and un-pooled samples, and identify optimal designs with respect to total number of individuals, number of individuals per pool, and the sequencing coverage. We perform a small empirical study to evaluate the pooling variance in a realistic setting where pooling is combined with exon-capturing. To test for associations, we develop a likelihood ratio statistic that accounts for the high error rate of next-generation sequencing data. We also perform extensive simulations to determine the power and accuracy of this method. Overall, our findings suggest that with a fixed cost, sequencing many individuals at a more shallow depth with larger pool size achieves higher power than sequencing a small number of individuals in higher depth with smaller pool size, even in the presence of high error rates. Our results provide guidelines for researchers who are developing association mapping studies based on next-generation sequencing. Genet. Epidemiol. 34 : 479-491, 2010. (C) 2010 Wiley-Liss, Inc.
引用
收藏
页码:479 / 491
页数:13
相关论文
共 56 条
[1]
Direct selection of human genomic loci by microarray hybridization [J].
Albert, Thomas J. ;
Molla, Michael N. ;
Muzny, Donna M. ;
Nazareth, Lynne ;
Wheeler, David ;
Song, Xingzhi ;
Richmond, Todd A. ;
Middle, Chris M. ;
Rodesch, Matthew J. ;
Packard, Charles J. ;
Weinstock, George M. ;
Gibbs, Richard A. .
NATURE METHODS, 2007, 4 (11) :903-905
[2]
Genetic Mapping in Human Disease [J].
Altshuler, David ;
Daly, Mark J. ;
Lander, Eric S. .
SCIENCE, 2008, 322 (5903) :881-888
[3]
Multiple rare nonsynonymous variants in the Adenomatous Polyposis Coli gene predispose to colorectal adenomas [J].
Azzopardi, Duncan ;
Dallosso, Anthony R. ;
Eliason, Kristilyn ;
Hendrickson, Brant C. ;
Jones, Natalie ;
Rawstorne, Edward ;
Colley, James ;
Moskvina, Valentina ;
Frye, Cynthia ;
Sampson, Julian R. ;
Wenstrup, Richard ;
Scholl, Thomas ;
Cheadle, Jeremy P. .
CANCER RESEARCH, 2008, 68 (02) :358-363
[4]
Association testing by DNA pooling: An effective initial screen [J].
Bansal, A ;
van den Boom, D ;
Kammerer, S ;
Honisch, C ;
Adam, G ;
Cantor, CR ;
Kleyn, P ;
Braun, A .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (26) :16871-16874
[5]
Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease [J].
Barrett, Jeffrey C. ;
Hansoul, Sarah ;
Nicolae, Dan L. ;
Cho, Judy H. ;
Duerr, Richard H. ;
Rioux, John D. ;
Brant, Steven R. ;
Silverberg, Mark S. ;
Taylor, Kent D. ;
Barmada, M. Michael ;
Bitton, Alain ;
Dassopoulos, Themistocles ;
Datta, Lisa Wu ;
Green, Todd ;
Griffiths, Anne M. ;
Kistner, Emily O. ;
Murtha, Michael T. ;
Regueiro, Miguel D. ;
Rotter, Jerome I. ;
Schumm, L. Philip ;
Steinhart, A. Hillary ;
Targan, Stephan R. ;
Xavier, Ramnik J. ;
Libioulle, Cecile ;
Sandor, Cynthia ;
Lathrop, Mark ;
Belaiche, Jacques ;
Dewit, Olivier ;
Gut, Ivo ;
Heath, Simon ;
Laukens, Debby ;
Mni, Myriam ;
Rutgeerts, Paul ;
Van Gossum, Andre ;
Zelenika, Diana ;
Franchimont, Denis ;
Hugot, Jean-Pierre ;
de Vos, Martine ;
Vermeire, Severine ;
Louis, Edouard ;
Cardon, Lon R. ;
Anderson, Carl A. ;
Drummond, Hazel ;
Nimmo, Elaine ;
Ahmad, Tariq ;
Prescott, Natalie J. ;
Onnie, Clive M. ;
Fisher, Sheila A. ;
Marchini, Jonathan ;
Ghori, Jilur .
NATURE GENETICS, 2008, 40 (08) :955-962
[6]
Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[7]
Common and rare variants in multifactorial susceptibility to common diseases [J].
Bodmer, Walter ;
Bonilla, Carolina .
NATURE GENETICS, 2008, 40 (06) :695-701
[8]
Identification of susceptibility genes for complex diseases using pooling-based genome-wide association scans [J].
Bosse, Yohan ;
Bacot, Francois ;
Montpetit, Alexandre ;
Rung, Johan ;
Qu, Hui-Qi ;
Engert, James C. ;
Polychronakos, Constantin ;
Hudson, Thomas J. ;
Froguel, Philippe ;
Sladek, Robert ;
Desrosiers, Martin .
HUMAN GENETICS, 2009, 125 (03) :305-318
[9]
Broyden C. G., 1970, Journal of the Institute of Mathematics and Its Applications, V6, P222
[10]
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678