HTSeq-a Python']Python framework to work with high-throughput sequencing data

被引:15141
作者
Anders, Simon [1 ]
Pyl, Paul Theodor [1 ]
Huber, Wolfgang [1 ]
机构
[1] European Mol Biol Lab, Genome Biol Unit, D-69111 Heidelberg, Germany
关键词
BIOCONDUCTOR; BIOLOGY;
D O I
10.1093/bioinformatics/btu638
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.
引用
收藏
页码:166 / 169
页数:4
相关论文
共 14 条
[1]
Beazley DM, 1996, PROCEEDINGS OF THE FOURTH ANNUAL TCL/TK WORKSHOP, P129
[2]
Cython: The Best of Both Worlds [J].
Behnel, Stefan ;
Bradshaw, Robert ;
Citro, Craig ;
Dalcin, Lisandro ;
Seljebotn, Dag Sverre ;
Smith, Kurt .
COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (02) :31-39
[3]
Trimmomatic: a flexible trimmer for Illumina sequence data [J].
Bolger, Anthony M. ;
Lohse, Marc ;
Usadel, Bjoern .
BIOINFORMATICS, 2014, 30 (15) :2114-2120
[4]
Biopython']python: freely available Python']Python tools for computational molecular biology and bioinformatics [J].
Cock, Peter J. A. ;
Antao, Tiago ;
Chang, Jeffrey T. ;
Chapman, Brad A. ;
Cox, Cymon J. ;
Dalke, Andrew ;
Friedberg, Iddo ;
Hamelryck, Thomas ;
Kauff, Frank ;
Wilczynski, Bartek ;
de Hoon, Michiel J. L. .
BIOINFORMATICS, 2009, 25 (11) :1422-1423
[5]
Pybedtools: a flexible Python']Python library for manipulating genomic datasets and annotations [J].
Dale, Ryan K. ;
Pedersen, Brent S. ;
Quinlan, Aaron R. .
BIOINFORMATICS, 2011, 27 (24) :3423-3424
[6]
RNA-Seq Gene Profiling - A Systematic Empirical Comparison [J].
Fonseca, Nuno A. ;
Marioni, John ;
Brazma, Alvis .
PLOS ONE, 2014, 9 (09)
[7]
Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[8]
Josuttis NicolaiM., 1999, C STANDARD LIB
[9]
Software for Computing and Annotating Genomic Ranges [J].
Lawrence, Michael ;
Huber, Wolfgang ;
Pages, Herve ;
Aboyoun, Patrick ;
Carlson, Marc ;
Gentleman, Robert ;
Morgan, Martin T. ;
Carey, Vincent J. .
PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (08)
[10]
featureCounts: an efficient general purpose program for assigning sequence reads to genomic features [J].
Liao, Yang ;
Smyth, Gordon K. ;
Shi, Wei .
BIOINFORMATICS, 2014, 30 (07) :923-930