Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals

被引:373
作者
Battle, Alexis [1 ]
Mostafavi, Sara [1 ]
Zhu, Xiaowei [2 ]
Potash, James B. [3 ]
Weissman, Myrna M. [4 ,5 ]
McCormick, Courtney [6 ]
Haudenschild, Christian D. [7 ]
Beckman, Kenneth B. [8 ]
Shi, Jianxin [9 ]
Mei, Rui [10 ]
Urban, Alexander E. [2 ]
Montgomery, Stephen B. [11 ,12 ]
Levinson, Douglas F. [2 ]
Koller, Daphne [1 ,11 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Psychiat & Behav Sci, Stanford, CA 94305 USA
[3] Univ Iowa Hosp & Clin, Dept Psychiat, Iowa City, IA 52242 USA
[4] Columbia Univ, Dept Psychiat, New York, NY 10032 USA
[5] New York State Psychiat Inst & Hosp, New York, NY 10032 USA
[6] Illumina Inc, San Diego, CA 92122 USA
[7] Personalis Inc, Menlo Pk, CA 94025 USA
[8] Univ Minnesota, Biomed Genom Ctr, Minneapolis, MN 55455 USA
[9] NCI, Div Canc Epidemiol & Genet, Bethesda, MD 20892 USA
[10] Centrill Biosci Inc, Palo Alto, CA 94303 USA
[11] Stanford Univ, Dept Pathol, Stanford, CA 94305 USA
[12] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
关键词
MOLECULAR INTERACTION DATABASE; HUMAN GENOME; REGULATORY VARIATION; EXPRESSION VARIATION; DISEASE; VARIANTS; COMMON; NETWORK; SEQ; CIS;
D O I
10.1101/gr.155192.113
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Understanding the consequences of regulatory variation in the human genome remains a major challenge, with important implications for understanding gene regulation and interpreting the many disease-risk variants that fall outside of protein-coding regions. Here, we provide a direct window into the regulatory consequences of genetic variation by sequencing RNA from 922 genotyped individuals. We present a comprehensive description of the distribution of regulatory variation-by the specific expression phenotypes altered, the properties of affected genes, and the genomic characteristics of regulatory variants. We detect variants influencing expression of over ten thousand genes, and through the enhanced resolution offered by RNA-sequencing, for the first time we identify thousands of variants associated with specific phenotypes including splicing and allelic expression. Evaluating the effects of both long-range intra-chromosomal and trans (cross-chromosomal) regulation, we observe modularity in the regulatory network, with three-dimensional chromosomal configuration playing a particular role in regulatory modules within each chromosome. We also observe a significant depletion of regulatory variants affecting central and critical genes, along with a trend of reduced effect sizes as variant frequency increases, providing evidence that purifying selection and buffering have limited the deleterious impact of regulatory variation on the cell. Further, generalizing beyond observed variants, we have analyzed the genomic properties of variants associated with expression and splicing and developed a Bayesian model to predict regulatory consequences of genetic variants, applicable to the interpretation of individual genomes and disease studies. Together, these results represent a critical step toward characterizing the complete landscape of human regulatory variation.
引用
收藏
页码:14 / 24
页数:11
相关论文
共 70 条
[1]   The IntAct molecular interaction database in 2010 [J].
Aranda, B. ;
Achuthan, P. ;
Alam-Faruque, Y. ;
Armean, I. ;
Bridge, A. ;
Derow, C. ;
Feuermann, M. ;
Ghanbarian, A. T. ;
Kerrien, S. ;
Khadake, J. ;
Kerssemakers, J. ;
Leroy, C. ;
Menden, M. ;
Michaut, M. ;
Montecchi-Palazzi, L. ;
Neuhauser, S. N. ;
Orchard, S. ;
Perreau, V. ;
Roechert, B. ;
van Eijk, K. ;
Hermjakob, H. .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D525-D531
[2]   Common and rare variants in multifactorial susceptibility to common diseases [J].
Bodmer, Walter ;
Bonilla, Carolina .
NATURE GENETICS, 2008, 40 (06) :695-701
[3]   Annotation of functional variation in personal genomes using RegulomeDB [J].
Boyle, Alan P. ;
Hong, Eurie L. ;
Hariharan, Manoj ;
Cheng, Yong ;
Schaub, Marc A. ;
Kasowski, Maya ;
Karczewski, Konrad J. ;
Park, Julie ;
Hitz, Benjamin C. ;
Weng, Shuai ;
Cherry, J. Michael ;
Snyder, Michael .
GENOME RESEARCH, 2012, 22 (09) :1790-1797
[4]   The GRID: The General Repository for Interaction Datasets [J].
Breitkreutz, BJ ;
Stark, C ;
Tyers, M .
GENOME BIOLOGY, 2003, 4 (03)
[5]   MINT: the molecular INTeraction database [J].
Chatr-aryamontri, Andrew ;
Ceol, Arnaud ;
Palazzi, Luisa Montecchi ;
Nardelli, Giuliano ;
Schneider, Maria Victoria ;
Castagnoli, Luisa ;
Cesareni, Gianni .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D572-D574
[6]   Polymorphic Cis- and Trans-Regulation of Human Gene Expression [J].
Cheung, Vivian G. ;
Nayak, Renuka R. ;
Wang, Isabel Xiaorong ;
Elwyn, Susannah ;
Cousins, Sarah M. ;
Morley, Michael ;
Spielman, Richard S. .
PLOS BIOLOGY, 2010, 8 (09)
[7]   The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].
Cock, Peter J. A. ;
Fields, Christopher J. ;
Goto, Naohisa ;
Heuer, Michael L. ;
Rice, Peter M. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771
[8]   Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data [J].
Cooper, Gregory M. ;
Shendure, Jay .
NATURE REVIEWS GENETICS, 2011, 12 (09) :628-640
[9]   DNase I sensitivity QTLs are a major determinant of human expression variation [J].
Degner, Jacob F. ;
Pai, Athma A. ;
Pique-Regi, Roger ;
Veyrieras, Jean-Baptiste ;
Gaffney, Daniel J. ;
Pickrell, Joseph K. ;
De Leon, Sherryl ;
Michelini, Katelyn ;
Lewellen, Noah ;
Crawford, Gregory E. ;
Stephens, Matthew ;
Gilad, Yoav ;
Pritchard, Jonathan K. .
NATURE, 2012, 482 (7385) :390-394
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38