Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels

被引:40
作者
Deelen, Patrick [1 ,2 ]
Zhernakova, Daria V. [1 ]
de Haan, Mark [1 ,2 ]
van der Sijde, Marijke [1 ]
Bonder, Marc Jan [1 ]
Karjalainen, Juha [1 ]
van der Velde, K. Joeri [1 ,2 ]
Abbott, Kristin M. [1 ]
Fu, Jingyuan [1 ]
Wijmenga, Cisca [1 ]
Sinke, Richard J. [1 ]
Swertz, Morris A. [1 ,2 ]
Franke, Lude [1 ]
机构
[1] Univ Groningen, Univ Med Ctr Groningen, Dept Genet, NL-9700 RB Groningen, Netherlands
[2] Univ Groningen, Univ Med Ctr Groningen, Genom Coordinat Ctr, NL-9700 RB Groningen, Netherlands
来源
GENOME MEDICINE | 2015年 / 7卷
关键词
ALLELE-SPECIFIC EXPRESSION; REGULATORY VARIATION; GENOME; TRANSCRIPTOME; IMPUTATION; SEQ; SUSCEPTIBILITY; ASSOCIATION; PSCA;
D O I
10.1186/s13073-015-0152-4
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq samples in the public domain, we here studied to what extent eQTLs and ASE effects can be identified when using public RNA-seq data while deriving the genotypes from the RNA-sequencing reads themselves. Methods: We downloaded the raw reads for all available human RNA-seq datasets. Using these reads we performed gene expression quantification. All samples were jointly normalized and subjected to a strict quality control. We also derived genotypes using the RNA-seq reads and used imputation to infer non-coding variants. This allowed us to perform eQTL mapping and ASE analyses jointly on all samples that passed quality control. Our results were validated using samples for which DNA-seq genotypes were available. Results: 4,978 public human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though these data originated from many different laboratories, samples reflecting the same cell type clustered together, suggesting that technical biases due to different sequencing protocols are limited. In a joint analysis on the 1,262 samples with high quality genotypes, we identified cis-eQTLs effects for 8,034 unique genes (at a false discovery rate <= 0.05). eQTL mapping on individual tissues revealed that a limited number of samples already suffice to identify tissue-specific eQTLs for known disease-associated genetic variants. Additionally, we observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels. Conclusions: By deriving and imputing genotypes from RNA-seq data, it is possible to identify both eQTLs and ASE effects. Given the exponential growth of the number of publicly available RNA-seq samples, we expect this approach will become especially relevant for studying the effects of tissue-specific and rare pathogenic genetic variants to aid clinical interpretation of exome and genome sequencing.
引用
收藏
页数:13
相关论文
共 43 条
[31]   Understanding mechanisms underlying human gene expression variation with RNA sequencing [J].
Pickrell, Joseph K. ;
Marioni, John C. ;
Pai, Athma A. ;
Degner, Jacob F. ;
Engelhardt, Barbara E. ;
Nkadori, Everlyne ;
Veyrieras, Jean-Baptiste ;
Stephens, Matthew ;
Gilad, Yoav ;
Pritchard, Jonathan K. .
NATURE, 2010, 464 (7289) :768-772
[32]   Reliable Identification of Genomic Variants from RNA-Seq Data [J].
Piskol, Robert ;
Ramaswami, Gokul ;
Li, Jin Billy .
AMERICAN JOURNAL OF HUMAN GENETICS, 2013, 93 (04) :641-651
[33]   PLINK: A tool set for whole-genome association and population-based linkage analyses [J].
Purcell, Shaun ;
Neale, Benjamin ;
Todd-Brown, Kathe ;
Thomas, Lori ;
Ferreira, Manuel A. R. ;
Bender, David ;
Maller, Julian ;
Sklar, Pamela ;
de Bakker, Paul I. W. ;
Daly, Mark J. ;
Sham, Pak C. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (03) :559-575
[34]   A scaling normalization method for differential expression analysis of RNA-seq data [J].
Robinson, Mark D. ;
Oshlack, Alicia .
GENOME BIOLOGY, 2010, 11 (03)
[35]   Genetic variation in PSCA is associated with susceptibility to diffuse-type gastric cancer [J].
Sakamoto, Hiromi ;
Yoshimura, Kimio ;
Saeki, Norihisa ;
Katai, Hitoshi ;
Shimoda, Tadakazu ;
Matsuno, Yoshihiro ;
Saito, Daizo ;
Sugimura, Haruhiko ;
Tanioka, Fumihiko ;
Kato, Shunji ;
Matsukura, Norio ;
Matsuda, Noriko ;
Nakamura, Tsuneya ;
Hyodo, Ichinosuke ;
Nishina, Tomohiro ;
Yasui, Wataru ;
Hirose, Hiroshi ;
Hayashi, Matsuhiko ;
Toshiro, Emi ;
Ohnami, Sumiko ;
Sekine, Akihiro ;
Sato, Yasunori ;
Totsuka, Hirohiko ;
Ando, Masataka ;
Takemura, Ryo ;
Takahashi, Yoriko ;
Ohdaira, Minoru ;
Aoki, Kenichi ;
Honmyo, Izumi ;
Chiku, Suenori ;
Aoyagi, Kazuhiko ;
Sasaki, Hiroki ;
Ohnami, Shumpei ;
Yanagihara, Kazuyoshi ;
Yoon, Kyong-Ah ;
Kook, Myeong-Cherl ;
Lee, Yeon-Su ;
Park, Sook Ryun ;
Kim, Chan Gyoo ;
Choi, Il Ju ;
Yoshida, Teruhiko ;
Nakamura, Yusuke ;
Hirohashi, Setsuo .
NATURE GENETICS, 2008, 40 (06) :730-740
[36]   Inherited deficiency of mannan-binding lectin-associated serine protease 2 [J].
Stengaard-Pedersen, K ;
Thiel, S ;
Gadjeva, M ;
Moller-Kristensen, M ;
Sorensen, R ;
Jensen, LT ;
Sjoholm, AG ;
Fugger, L ;
Jensenius, JC .
NEW ENGLAND JOURNAL OF MEDICINE, 2003, 349 (06) :554-560
[37]   The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button [J].
Swertz, Morris A. ;
Dijkstra, Martijn ;
Adamusiak, Tomasz ;
van der Velde, Joeri K. ;
Kanterakis, Alexandros ;
Roos, Erik T. ;
Lops, Joris ;
Thorisson, Gudmundur A. ;
Arends, Danny ;
Byelas, George ;
Muilu, Juha ;
Brookes, Anthony J. ;
de Brock, Engbert O. ;
Jansen, Ritsert C. ;
Parkinson, Helen .
BMC BIOINFORMATICS, 2010, 11
[38]   The NHGRI GWAS Catalog, a curated resource of SNP-trait associations [J].
Welter, Danielle ;
MacArthur, Jacqueline ;
Morales, Joannella ;
Burdett, Tony ;
Hall, Peggy ;
Junkins, Heather ;
Klemm, Alan ;
Flicek, Paul ;
Manolio, Teri ;
Hindorff, Lucia ;
Parkinson, Helen .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D1001-D1006
[39]   Systematic identification of trans eQTLs as putative drivers of known disease associations [J].
Westra, Harm-Jan ;
Peters, Marjolein J. ;
Esko, Tonu ;
Yaghootkar, Hanieh ;
Schurmann, Claudia ;
Kettunen, Johannes ;
Christiansen, Mark W. ;
Fairfax, Benjamin P. ;
Schramm, Katharina ;
Powell, Joseph E. ;
Zhernakova, Alexandra ;
Zhernakova, Daria V. ;
Veldink, Jan H. ;
Van den Berg, Leonard H. ;
Karjalainen, Juha ;
Withoff, Sebo ;
Uitterlinden, Andre G. ;
Hofman, Albert ;
Rivadeneira, Fernando ;
't Hoen, Peter A. C. ;
Reinmaa, Eva ;
Fischer, Krista ;
Nelis, Mari ;
Milani, Lili ;
Melzer, David ;
Ferrucci, Luigi ;
Singleton, Andrew B. ;
Hernandez, Dena G. ;
Nalls, Michael A. ;
Homuth, Georg ;
Nauck, Matthias ;
Radke, Doerte ;
Voelker, Uwe ;
Perola, Markus ;
Salomaa, Veikko ;
Brody, Jennifer ;
Suchy-Dicey, Astrid ;
Gharib, Sina A. ;
Enquobahrie, Daniel A. ;
Lumley, Thomas ;
Montgomery, Grant W. ;
Makino, Seiko ;
Prokisch, Holger ;
Herder, Christian ;
Roden, Michael ;
Grallert, Harald ;
Meitinger, Thomas ;
Strauch, Konstantin ;
Li, Yang ;
Jansen, Ritsert C. .
NATURE GENETICS, 2013, 45 (10) :1238-U195
[40]   Heritability and genomics of gene expression in peripheral blood [J].
Wright, Fred A. ;
Sullivan, Patrick F. ;
Brooks, Andrew I. ;
Zou, Fei ;
Sun, Wei ;
Xia, Kai ;
Madar, Vered ;
Jansen, Rick ;
Chung, Wonil ;
Zhou, Yi-Hui ;
Abdellaoui, Abdel ;
Batista, Sandra ;
Butler, Casey ;
Chen, Guanhua ;
Chen, Ting-Huei ;
D'Ambrosiol, David ;
Gallins, Paul ;
Ha, Min Jin ;
Hottenga, Jouke Jan ;
Huang, Shunping ;
Kattenberg, Mathijs ;
Kochar, Jaspreet ;
Middeldorp, Christel M. ;
Qui, Ani ;
Shabalinn, Andrey ;
Tischfield, Jay ;
Todd, Laura ;
Tzeng, Jung-Ying ;
van Grootheest, Gerard ;
Vink, Jacqueline M. ;
Wang, Qi ;
Wang, Wei ;
Wang, Weibo ;
Willemsen, Gonneke ;
Smit, Johannes H. ;
de Geus, Eco J. ;
Yin, Zhaoyu ;
Penninx, Brenda W. J. H. ;
Boomsma, Dorret I. .
NATURE GENETICS, 2014, 46 (05) :430-437