DE-kupl: exhaustive capture of biological variation in RNA-seq data through k-mer decomposition

被引:27
作者
Audoux, Jerome [1 ]
Philippe, Nicolas [2 ,3 ]
Chikhi, Rayan [4 ]
Salson, Mikael [4 ]
Gallopin, Melina [5 ]
Gabriel, Marc [5 ,6 ]
Le Coz, Jeremy [5 ]
Drouineau, Emilie [5 ]
Commes, Therese [1 ,2 ]
Gautheret, Daniel [5 ,6 ]
机构
[1] Univ Montpellier, INSERM IRMB U1183, Hop St Eloi, 80 Ave Augustin Fliche, F-34295 Montpellier, France
[2] Univ Montpellier, Inst Biol Computat, Montpellier, France
[3] CHRU Montpellier, SeqOne, IRMB, Hop St Eloi, Montpellier, France
[4] Univ Lille, CNRS, INRIA, UMR CRIStAL 9189, F-59000 Lille, France
[5] Univ Paris Saclay, Univ Paris Sud, Inst Integrat Biol Cell, CEA,CNRS, Gif Sur Yvette, France
[6] INSERM, Inst Cancerol, AMMICA, CNRS,US23,UMS3655, Gustave Roussy Canc Campus, Villejuif, France
来源
GENOME BIOLOGY | 2017年 / 18卷
关键词
EXPRESSION ANALYSIS; SEQUENCING DATA; GENOME; GENE; QUANTIFICATION; IDENTIFICATION; TRANSCRIPTOME; VARIANTS;
D O I
10.1186/s13059-017-1372-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
We introduce a k-mer-based computational protocol, DE-kupl, for capturing local RNA variation in a set of RNA-seq libraries, independently of a reference genome or transcriptome. DE-kupl extracts all k-mers with differential abundance directly from the raw data files. This enables the retrieval of virtually all variation present in an RNA-seq data set. This variation is subsequently assigned to biological events or entities such as differential long non-coding RNAs, splice and polyadenylation variants, introns, repeats, editing or mutation events, and exogenous RNA. Applying DE-kupl to human RNA-seq data sets identified multiple types of novel events, reproducibly across independent RNA-seq experiments.
引用
收藏
页数:15
相关论文
共 55 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]  
[Anonymous], BIOINFORMATICS
[3]   NCBI GEO: archive for functional genomics data sets-update [J].
Barrett, Tanya ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Holko, Michelle ;
Yefanov, Andrey ;
Lee, Hyeseung ;
Zhang, Naigong ;
Robertson, Cynthia L. ;
Serova, Nadezhda ;
Davis, Sean ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D991-D995
[4]   Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript [J].
Benelli, Matteo ;
Pescucci, Chiara ;
Marseglia, Giuseppina ;
Severgnini, Marco ;
Torricelli, Francesca ;
Magi, Alberto .
BIOINFORMATICS, 2012, 28 (24) :3232-3239
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]  
Birol I, 2015, BIOCOMPUT-PAC SYM, P347
[7]   Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016) [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (08) :888-888
[8]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[9]   The biogenesis and emerging roles of circular RNAs [J].
Chen, Ling-Ling .
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2016, 17 (04) :205-211
[10]   Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels [J].
Deelen, Patrick ;
Zhernakova, Daria V. ;
de Haan, Mark ;
van der Sijde, Marijke ;
Bonder, Marc Jan ;
Karjalainen, Juha ;
van der Velde, K. Joeri ;
Abbott, Kristin M. ;
Fu, Jingyuan ;
Wijmenga, Cisca ;
Sinke, Richard J. ;
Swertz, Morris A. ;
Franke, Lude .
GENOME MEDICINE, 2015, 7