RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application

被引:61
作者
D'Antonio, Mattia [1 ]
De Meo, Paolo D'Onorio [1 ]
Pallocca, Matteo [2 ]
Picardi, Ernesto [3 ]
D'Erchia, Anna Maria [3 ]
Calogero, Raffaele A. [4 ]
Castrignano, Tiziana [1 ]
Pesole, Graziano [3 ,5 ,6 ]
机构
[1] CINECA, Consorzio Interuniv Calcolo Automat, Bologna, Italy
[2] Italian Natl Canc Inst Regina Elena, Translat Oncogen Unit, Rome, Italy
[3] Univ Bari, Dipartimento Biosci Biotecnol & Biofarmaceut, Bari, Italy
[4] Univ Turin, Dipartimento Biotecnol & Sci Salute, Turin, Italy
[5] CNR, Ist Biomembrane & Bioenerget, Bari, Italy
[6] Ctr Excellence Genom CEGBA, Bari, Italy
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; ALIGNMENT; TRANSCRIPTS; SEQUENCES; QUANTIFICATION; IDENTIFICATION; ULTRAFAST; DATABASE; GENOMES; FORMAT;
D O I
10.1186/1471-2164-16-S6-S3
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 [微生物学]; 090105 [作物生产系统与生态工程];
摘要
Background: The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). Moreover, the huge volume of data generated by NGS platforms introduces unprecedented computational and technological challenges to efficiently analyze and store sequence data and results. Methods: In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Results: Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.
引用
收藏
页数:11
相关论文
共 47 条
[1]
Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]
[Anonymous], FastQC, a quality control tool for high throughput sequence data
[3]
Patterns of variant polyadenylation signal usage in human genes [J].
Beaudoing, E ;
Freier, S ;
Wyatt, JR ;
Claverie, JM ;
Gautheret, D .
GENOME RESEARCH, 2000, 10 (07) :1001-1010
[4]
CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]
Blankenberg Daniel, 2010, Curr Protoc Mol Biol, VChapter 19, DOI 10.1002/0471142727.mb1910s89
[6]
NGS-Trex: Next Generation Sequencing Transcriptome profile explorer [J].
Boria, Ilenia ;
Boatti, Lara ;
Pesole, Graziano ;
Mignone, Flavio .
BMC BIOINFORMATICS, 2013, 14
[7]
Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses [J].
Cabili, Moran N. ;
Trapnell, Cole ;
Goff, Loyal ;
Koziol, Magdalena ;
Tazon-Vega, Barbara ;
Regev, Aviv ;
Rinn, John L. .
GENES & DEVELOPMENT, 2011, 25 (18) :1915-1927
[8]
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].
Cock, Peter J. A. ;
Fields, Christopher J. ;
Goto, Naohisa ;
Heuer, Michael L. ;
Rice, Peter M. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771
[9]
WEP: a high-performance analysis pipeline for whole-exome data [J].
D'Antonio, Mattia ;
De Meo, Paolo D'Onorio ;
Paoletti, Daniele ;
Elmi, Berardino ;
Pallocca, Matteo ;
Sanna, Nico ;
Picardi, Ernesto ;
Pesole, Graziano ;
Castrignano, Tiziana .
BMC BIOINFORMATICS, 2013, 14
[10]
mRNA expression, splicing and editing in the embryonic and adult mouse cerebral cortex [J].
Dillman, Allissa A. ;
Hauser, David N. ;
Gibbs, J. Raphael ;
Nalls, Michael A. ;
McCoy, Melissa K. ;
Rudenko, Iakov N. ;
Galter, Dagmar ;
Cookson, Mark R. .
NATURE NEUROSCIENCE, 2013, 16 (04) :499-U178