OUTRIDER: A Statistical Method for Detecting Aberrantly Expressed Genes in RNA Sequencing Data

被引:132
作者
Brechtmann, Felix [1 ]
Mertes, Christian [1 ]
Matuseviciute, Agne [1 ]
Yepez, Vicente A. [1 ,2 ]
Avsec, Ziga [1 ,2 ]
Herzog, Maximilian [1 ]
Bader, Daniel M. [1 ]
Prokisch, Holger [3 ,4 ]
Gagneur, Julien [1 ,2 ]
机构
[1] Tech Univ Munich, Dept Informat, Boltzmannstr 3, D-85748 Garching, Germany
[2] Ludwig Maximilians Univ Munchen, Gene Ctr, Dept Biochem, Quantitat Biosci Munich, Feodor Lynen Str 25, D-81377 Munich, Germany
[3] Helmholtz Zentrum Munchen, Inst Human Genet, Ingolstadter Landstr 1, D-85764 Neuherberg, Germany
[4] Tech Univ Munich, Inst Human Genet, Klinikum Rechts Isar, 13 Ismaninger Str 22, D-81675 Munich, Germany
基金
美国国家卫生研究院;
关键词
DIFFERENTIAL EXPRESSION; TRANSCRIPTOME; IMPACT;
D O I
10.1016/j.ajhg.2018.10.025
中图分类号
Q3 [遗传学];
学科分类号
071007 [遗传学];
摘要
RNA sequencing (RNA-seq) is gaining popularity as a complementary assay to genome sequencing for precisely identifying the molecular causes of rare disorders. A powerful approach is to identify aberrant gene expression levels as potential pathogenic events. However, existing methods for detecting aberrant read counts in RNA-seq data either lack assessments of statistical significance, so that establishing cutoffs is arbitrary, or rely on subjective manual corrections for confounders. Here, we describe OUTRIDER (Outlier in RNA-Seq Finder), an algorithm developed to address these issues. The algorithm uses an autoencoder to model read-count expectations according to the gene covariation resulting from technical, environmental, or common genetic variations. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. The model is automatically fitted to achieve the best recall of artificially corrupted data. Precision-recall analyses using simulated outlier read counts demonstrated the importance of controlling for covariation and significance-based thresholds. OUTRIDER is open source and includes functions for filtering out genes not expressed in a dataset, for identifying outlier samples with too many aberrantly expressed genes, and for detecting aberrant gene expression on the basis of false-discovery-rate-adjusted p values. Overall, OUTRIDER provides an end-to-end solution for identifying aberrantly expressed genes and is suitable for use by rare-disease diagnostic platforms.
引用
收藏
页码:907 / 917
页数:11
相关论文
共 39 条
[1]
Robust Inference in the Negative Binomial Regression Model with an Application to Falls Data [J].
Aeberhard, William H. ;
Cantoni, Eva ;
Heritier, Stephane .
BIOMETRICS, 2014, 70 (04) :920-931
[2]
A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[3]
Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[4]
[Anonymous], 1974, Outliers in statistical data
[5]
[Anonymous], BIORXIV200681
[6]
[Anonymous], 1966, Multivariate Analysis
[7]
The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans [J].
Ardlie, Kristin G. ;
DeLuca, David S. ;
Segre, Ayellet V. ;
Sullivan, Timothy J. ;
Young, Taylor R. ;
Gelfand, Ellen T. ;
Trowbridge, Casandra A. ;
Maller, Julian B. ;
Tukiainen, Taru ;
Lek, Monkol ;
Ward, Lucas D. ;
Kheradpour, Pouya ;
Iriarte, Benjamin ;
Meng, Yan ;
Palmer, Cameron D. ;
Esko, Tonu ;
Winckler, Wendy ;
Hirschhorn, Joel N. ;
Kellis, Manolis ;
MacArthur, Daniel G. ;
Getz, Gad ;
Shabalin, Andrey A. ;
Li, Gen ;
Zhou, Yi-Hui ;
Nobel, Andrew B. ;
Rusyn, Ivan ;
Wright, Fred A. ;
Lappalainen, Tuuli ;
Ferreira, Pedro G. ;
Ongen, Halit ;
Rivas, Manuel A. ;
Battle, Alexis ;
Mostafavi, Sara ;
Monlong, Jean ;
Sammeth, Michael ;
Mele, Marta ;
Reverter, Ferran ;
Goldmann, Jakob M. ;
Koller, Daphne ;
Guigo, Roderic ;
McCarthy, Mark I. ;
Dermitzakis, Emmanouil T. ;
Gamazon, Eric R. ;
Im, Hae Kyung ;
Konkashbaev, Anuar ;
Nicolae, Dan L. ;
Cox, Nancy J. ;
Flutre, Timothee ;
Wen, Xiaoquan ;
Stephens, Matthew .
SCIENCE, 2015, 348 (6235) :648-660
[8]
Benjamini Y, 2001, ANN STAT, V29, P1165
[9]
AUTO-ASSOCIATION BY MULTILAYER PERCEPTRONS AND SINGULAR VALUE DECOMPOSITION [J].
BOURLARD, H ;
KAMP, Y .
BIOLOGICAL CYBERNETICS, 1988, 59 (4-5) :291-294
[10]
A LIMITED MEMORY ALGORITHM FOR BOUND CONSTRAINED OPTIMIZATION [J].
BYRD, RH ;
LU, PH ;
NOCEDAL, J ;
ZHU, CY .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1995, 16 (05) :1190-1208