The theory of discovering rare variants via DNA sequencing

被引:9
作者
Wendl, Michael C. [1 ]
Wilson, Richard K.
机构
[1] Washington Univ, Genome Ctr, St Louis, MO 63108 USA
来源
BMC GENOMICS | 2009年 / 10卷
关键词
PATTERNS; GENOME; GENES;
D O I
10.1186/1471-2164-10-485
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Rare population variants are known to have important biomedical implications, but their systematic discovery has only recently been enabled by advances in DNA sequencing. The design process of a discovery project remains formidable, being limited to ad hoc mixtures of extensive computer simulation and pilot sequencing. Here, the task is examined from a general mathematical perspective. Results: We pose and solve the population sequencing design problem and subsequently apply standard optimization techniques that maximize the discovery probability. Emphasis is placed on cases whose discovery thresholds place them within reach of current technologies. We find that parameter values characteristic of rare-variant projects lead to a general, yet remarkably simple set of optimization rules. Specifically, optimal processing occurs at constant values of the per-sample redundancy, refuting current notions that sample size should be selected outright. Optimal project-wide redundancy and sample size are then shown to be inversely proportional to the desired variant frequency. A second family of constants governs these relationships, permitting one to immediately establish the most efficient settings for a given set of discovery conditions. Our results largely concur with the empirical design of the Thousand Genomes Project, though they furnish some additional refinement. Conclusion: The optimization principles reported here dramatically simplify the design process and should be broadly useful as rare-variant projects become both more important and routine in the future.
引用
收藏
页数:9
相关论文
共 25 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]   NUMERICAL-METHODS FOR SCIENTISTS AND ENGINEERS - HAMMING,RW [J].
BARNETT, VD .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-GENERAL, 1962, 125 (04) :642-643
[3]   Population genetics - making sense out of sequence [J].
Chakravarti, A .
NATURE GENETICS, 1999, 21 (Suppl 1) :56-60
[4]   PolyScan: An automatic indel and SNP detection approach to the analysis of human resequencing data [J].
Chen, Ken ;
McLellan, Michael D. ;
Ding, Li ;
Wendl, Michael C. ;
Kasai, Yumi ;
Wilson, Richard K. ;
Mardis, Elaine R. .
GENOME RESEARCH, 2007, 17 (05) :659-666
[5]  
COURANT R, 1937, DIFFERENTIAL INTEGRA, V1
[6]   Multiple rare variants in different genes account for multifactorial inherited susceptibility to colorectal adenomas [J].
Fearnhead, NS ;
Wilding, JL ;
Winney, B ;
Tonks, S ;
Bartlett, S ;
Bicknell, DC ;
Tomlinson, IPM ;
Mortensen, NJM ;
Bodmer, WF .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (45) :15992-15997
[7]  
Feller W., 1968, An introduction to probability theory and its applications, V3rd
[8]   Deeper into the genome [J].
Gibbs, R .
NATURE, 2005, 437 (7063) :1233-1234
[9]   Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis [J].
Halushka, MK ;
Fan, JB ;
Bentley, K ;
Hsie, L ;
Shen, NP ;
Weder, A ;
Cooper, R ;
Lipshutz, R ;
Chakravarti, A .
NATURE GENETICS, 1999, 22 (03) :239-247
[10]   Evaluation of next generation sequencing platforms for population targeted sequencing studies [J].
Harismendy, Olivier ;
Ng, Pauline C. ;
Strausberg, Robert L. ;
Wang, Xiaoyun ;
Stockwell, Timothy B. ;
Beeson, Karen Y. ;
Schork, Nicholas J. ;
Murray, Sarah S. ;
Topol, Eric J. ;
Levy, Samuel ;
Frazer, Kelly A. .
GENOME BIOLOGY, 2009, 10 (03)