MolBioLib: a C++11 framework for rapid development and deployment of bioinformatics tasks

被引:8
作者
Ohsumi, Toshiro K. [1 ,2 ]
Borowsky, Mark L. [1 ,2 ]
机构
[1] Massachusetts Gen Hosp, Dept Mol Biol, Richard B Simches Res Ctr, Boston, MA 02114 USA
[2] Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
关键词
GENOME; TOOLKIT; ARACHNE; RNAS;
D O I
10.1093/bioinformatics/bts458
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We developed MolBioLib to address the need for adaptable next-generation sequencing analysis tools. The result is a compact, portable and extensively tested C++11 software framework and set of applications tailored to the demands of next-generation sequencing data and applicable to many other applications. MolBioLib is designed to work with common file formats and data types used both in genomic analysis and general data analysis. A central relational-database-like Table class is a flexible and powerful object to intuitively represent and work with a wide variety of tabular datasets, ranging from alignment data to annotations. MolBioLib has been used to identify causative single-nucleotide polymorphisms in whole genome sequencing, detect balanced chromosomal rearrangements and compute enrichment of messenger RNAs (mRNAs) on microtubules, typically requiring applications of under 200 lines of code. MolBioLib includes programs to perform a wide variety of analysis tasks, such as computing read coverage, annotating genomic intervals and novel peak calling with a wavelet algorithm. Although MolBioLib was designed primarily for bioinformatics purposes, much of its functionality is applicable to a wide range of problems. Complete documentation and an extensive automated test suite are provided.
引用
收藏
页码:2412 / 2416
页数:5
相关论文
共 40 条
[1]  
[Anonymous], 2011, PROGR LANG C
[2]  
[Anonymous], 2003, NCBI HDB
[3]  
[Anonymous], 2012, GCC GNU COMP COLL
[4]   BamTools: a C++ API and toolkit for analyzing and managing BAM files [J].
Barnett, Derek W. ;
Garrison, Erik K. ;
Quinlan, Aaron R. ;
Stroemberg, Michael P. ;
Marth, Gabor T. .
BIOINFORMATICS, 2011, 27 (12) :1691-1692
[5]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[6]  
Clang, 2012, CLANG C LANGUAGE FAM
[7]  
CODD EF, 1970, COMMUN ACM, V13, P377, DOI 10.1145/357980.358007
[8]   SeqAn An efficient, generic C++ library for sequence analysis [J].
Doering, Andreas ;
Weese, David ;
Rausch, Tobias ;
Reinert, Knut .
BMC BIOINFORMATICS, 2008, 9 (1)
[9]   Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching [J].
Du, Pan ;
Kibbe, Warren A. ;
Lin, Simon M. .
BIOINFORMATICS, 2006, 22 (17) :2059-2065
[10]   Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs [J].
Dutheil, Julien ;
Boussau, Bastien .
BMC EVOLUTIONARY BIOLOGY, 2008, 8 (1)