A basic analysis toolkit for biological sequences

被引:2
作者
Giancarlo, Raffaele [1 ]
Siragusa, Alessandro [1 ]
Siragusa, Enrico [1 ]
Utro, Filippo [1 ]
机构
[1] Univ Palermo, Dipartimento Matemat Applicaz, I-90133 Palermo, Italy
关键词
D O I
10.1186/1748-7188-2-10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/similar to raffaele/BATS/ under the GNU GPL.
引用
收藏
页数:16
相关论文
共 37 条
[1]  
Aho AV, 1983, DATA STRUCTURES ALGO
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   Faster algorithms for string matching with k mismatches [J].
Amir, A ;
Lewenstein, M ;
Porat, E .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2004, 50 (02) :257-275
[4]  
Apostolico A., 1997, HDB FORMAL LANGUAGES, P361, DOI DOI 10.1007/978-3-662-07675-0_8
[5]   Sparse dynamic programming for longest common subsequence from fragments [J].
Baker, BS ;
Giancarlo, R .
JOURNAL OF ALGORITHMS, 2002, 42 (02) :231-254
[6]   Approximate string matching: A simpler faster algorithm [J].
Cole, R ;
Hariharan, R .
SIAM JOURNAL ON COMPUTING, 2002, 31 (06) :1761-1782
[7]  
CZUMAJ A, 1997, WORKSH ALG ENG U VEN, P166
[8]  
Dayhoff M. O., 1978, ATLAS PROTEIN SEQUEN, P345
[9]   SPARSE DYNAMIC-PROGRAMMING .2. CONVEX AND CONCAVE COST-FUNCTIONS [J].
EPPSTEIN, D ;
GALIL, Z ;
GIANCARLO, R ;
ITALIANO, GF .
JOURNAL OF THE ACM, 1992, 39 (03) :546-567
[10]   SPARSE DYNAMIC-PROGRAMMING .1. LINEAR COST-FUNCTIONS [J].
EPPSTEIN, D ;
GALIL, Z ;
GIANCARLO, R ;
ITALIANO, GF .
JOURNAL OF THE ACM, 1992, 39 (03) :519-545