SUBSAMPLING METHODS FOR GENOMIC INFERENCE

被引:48
作者
Bickel, Peter J. [1 ]
Boley, Nathan [1 ]
Brown, James B. [1 ]
Huang, Haiyan [1 ]
Zhang, Nancy R. [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Stanford Univ, Stanford, CA 94305 USA
关键词
Genome Structure Correction (GSC); subsampling; piecewise stationary model; segmentation-block bootstrap; feature overlap; FACTOR-BINDING SITES; DNA-SEQUENCES; COPY NUMBER; BOOTSTRAP; SEGMENTATION; IDENTIFICATION; 1-PERCENT; MODELS; CHOICE;
D O I
10.1214/10-AOAS363
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Large-scale statistical analysis of data sets associated with genome sequences plays an important role in modern biology. A key component of such statistical analyses is the computation of p-values and confidence bounds for statistics defined on the genome. Currently such computation is commonly achieved through ad hoc simulation measures. The method of randomization, which is at the heart of these simulation procedures, can significantly affect the resulting statistical conclusions. Most simulation schemes introduce a variety of hidden assumptions regarding the nature of the randomness in the data, resulting in a failure to capture biologically meaningful relationships. To address the need for a method of assessing the significance of observations within large scale genomic studies, where there often exists a complex dependency structure between observations, we propose a unified solution built upon a data subsampling approach. We propose a piecewise stationary model for genome sequences and show that the subsampling approach gives correct answers under this model. We illustrate the method on three simulation studies and two real data examples.
引用
收藏
页码:1660 / 1697
页数:38
相关论文
共 38 条
[1]  
ANDREWS D, 1974, J R STAT SOC B, V26, P99
[3]   THE MOSAIC GENOME OF WARM-BLOODED VERTEBRATES [J].
BERNARDI, G ;
OLOFSSON, B ;
FILIPSKI, J ;
ZERIAL, M ;
SALINAS, J ;
CUNY, G ;
MEUNIERROTIVAL, M ;
RODIER, F .
SCIENCE, 1985, 228 (4702) :953-958
[4]  
Bickel PJ, 2008, STAT SINICA, V18, P967
[5]  
BICKEL PJ, 2010, SUBSAMPLING METHOD S, DOI DOI 10.1214/10-AOAS363SUPP
[6]  
BICKEL PJ, 1997, STAT SINICA, V1, P1
[7]   Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project [J].
Birney, Ewan ;
Stamatoyannopoulos, John A. ;
Dutta, Anindya ;
Guigo, Roderic ;
Gingeras, Thomas R. ;
Margulies, Elliott H. ;
Weng, Zhiping ;
Snyder, Michael ;
Dermitzakis, Emmanouil T. ;
Stamatoyannopoulos, John A. ;
Thurman, Robert E. ;
Kuehn, Michael S. ;
Taylor, Christopher M. ;
Neph, Shane ;
Koch, Christoph M. ;
Asthana, Saurabh ;
Malhotra, Ankit ;
Adzhubei, Ivan ;
Greenbaum, Jason A. ;
Andrews, Robert M. ;
Flicek, Paul ;
Boyle, Patrick J. ;
Cao, Hua ;
Carter, Nigel P. ;
Clelland, Gayle K. ;
Davis, Sean ;
Day, Nathan ;
Dhami, Pawandeep ;
Dillon, Shane C. ;
Dorschner, Michael O. ;
Fiegler, Heike ;
Giresi, Paul G. ;
Goldy, Jeff ;
Hawrylycz, Michael ;
Haydock, Andrew ;
Humbert, Richard ;
James, Keith D. ;
Johnson, Brett E. ;
Johnson, Ericka M. ;
Frum, Tristan T. ;
Rosenzweig, Elizabeth R. ;
Karnani, Neerja ;
Lee, Kirsten ;
Lefebvre, Gregory C. ;
Navas, Patrick A. ;
Neri, Fidencio ;
Parker, Stephen C. J. ;
Sabo, Peter J. ;
Sandstrom, Richard ;
Shafer, Anthony .
NATURE, 2007, 447 (7146) :799-816
[8]   An intermediate grade of finished genomic sequence suitable for comparative analyses [J].
Blakesley, RW ;
Hansen, NF ;
Mullikin, JC ;
Thomas, PJ ;
McDowell, JC ;
Maskeri, B ;
Young, AC ;
Benjamin, B ;
Brooks, SY ;
Coleman, BI ;
Gupta, J ;
Ho, SL ;
Karlins, EM ;
Maduro, QL ;
Stantripop, S ;
Tsurgeon, C ;
Vogt, JL ;
Walker, MA ;
Masiello, CA ;
Guan, XB ;
Bouffared, GG ;
Green, ED .
GENOME RESEARCH, 2004, 14 (11) :2235-2244
[9]  
Braun JV, 1998, STAT SCI, V13, P142
[10]   Methods and strategies for analyzing copy number variation using DNA microarrays [J].
Carter, Nigel P. .
NATURE GENETICS, 2007, 39 (Suppl 7) :S16-S21