RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries

被引:78
作者
Habegger, Lukas [1 ,2 ]
Sboner, Andrea [1 ,2 ]
Gianoulis, Tara A. [3 ,4 ]
Rozowsky, Joel [2 ]
Agarwal, Ashish [2 ,5 ]
Snyder, Michael [6 ]
Gerstein, Mark [1 ,2 ,5 ]
机构
[1] Yale Univ, Program Computat Biol & Bioinformat, New Haven, CT 06520 USA
[2] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT USA
[3] Wyss Inst Biol Inspired Engn Harvard, Boston, MA USA
[4] Harvard Univ, Sch Med, Dept Genet, Boston, MA USA
[5] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA
[6] Stanford Univ, Dept Genet, Sch Med, Stanford, CA 94305 USA
基金
美国国家卫生研究院;
关键词
TRANSCRIPTOMES; REVEALS;
D O I
10.1093/bioinformatics/btq643
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses.
引用
收藏
页码:281 / 283
页数:3
相关论文
共 11 条
[1]   Genomic Anonymity: Have We Already Lost It? [J].
Greenbaum, Dov ;
Du, Jiang ;
Gerstein, Mark .
AMERICAN JOURNAL OF BIOETHICS, 2008, 8 (10) :71-74
[2]   Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs [J].
Guttman, Mitchell ;
Garber, Manuel ;
Levin, Joshua Z. ;
Donaghey, Julie ;
Robinson, James ;
Adiconis, Xian ;
Fan, Lin ;
Koziol, Magdalena J. ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Rinn, John L. ;
Lander, Eric S. ;
Regev, Aviv .
NATURE BIOTECHNOLOGY, 2010, 28 (05) :503-U166
[3]   Massively parallel sequencing of the polyadenylated transcriptome of C. elegans [J].
Hillier, LaDeana W. ;
Reinke, Valerie ;
Green, Philip ;
Hirst, Martin ;
Marra, Marco A. ;
Waterston, Robert H. .
GENOME RESEARCH, 2009, 19 (04) :657-666
[4]   Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22 [J].
Kampa, D ;
Cheng, J ;
Kapranov, P ;
Yamanaka, M ;
Brubaker, S ;
Cawley, S ;
Drenkow, J ;
Piccolboni, A ;
Bekiranov, S ;
Helt, G ;
Tammana, H ;
Gingeras, TR .
GENOME RESEARCH, 2004, 14 (03) :331-342
[5]   The Sequence Alignment/Map format and SAMtools [J].
Li, Heng ;
Handsaker, Bob ;
Wysoker, Alec ;
Fennell, Tim ;
Ruan, Jue ;
Homer, Nils ;
Marth, Gabor ;
Abecasis, Goncalo ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (16) :2078-2079
[6]   Ethics - Identifiability in genomic research [J].
Lowrance, William W. ;
Collins, Francis S. .
SCIENCE, 2007, 317 (5838) :600-602
[7]   Mapping and quantifying mammalian transcriptomes by RNA-Seq [J].
Mortazavi, Ali ;
Williams, Brian A. ;
McCue, Kenneth ;
Schaeffer, Lorian ;
Wold, Barbara .
NATURE METHODS, 2008, 5 (07) :621-628
[8]   Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping [J].
Royce, TE ;
Rozowsky, JS ;
Bertone, P ;
Samanta, M ;
Stolc, V ;
Weissman, S ;
Snyder, M ;
Gerstein, M .
TRENDS IN GENETICS, 2005, 21 (08) :466-475
[9]   Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation [J].
Trapnell, Cole ;
Williams, Brian A. ;
Pertea, Geo ;
Mortazavi, Ali ;
Kwan, Gordon ;
van Baren, Marijke J. ;
Salzberg, Steven L. ;
Wold, Barbara J. ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2010, 28 (05) :511-U174
[10]   How to map billions of short reads onto genomes [J].
Trapnell, Cole ;
Salzberg, Steven L. .
NATURE BIOTECHNOLOGY, 2009, 27 (05) :455-457