A probabilistic measure for alignment-free sequence comparison

被引:84
作者
Pham, TD
Zuegg, J
机构
[1] Griffith Univ, Sch Comp & Informat Technol, Nathan, Qld 4111, Australia
[2] Alchemia Ltd, Brisbane, Qld 4122, Australia
关键词
D O I
10.1093/bioinformatics/bth426
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models. Results: The method was tested against six DNA sequences, which are the thrA, thrB and thrC genes of the threonine operons from Escherichia coli K-12 and from Shigella flexneri; and one random sequence having the same base composition as thrA from E.coli. These results were compared with those obtained from CLUSTAL W algorithm (alignment-based) and the chaos game representation (alignment-free). The method was further tested against a more complex set of 40 DNA sequences and compared with other existing sequence similarity measures (alignment-free).
引用
收藏
页码:3455 / 3461
页数:7
相关论文
共 20 条
[1]   Analysis of genomic sequences by Chaos Game Representation [J].
Almeida, JS ;
Carriço, JA ;
Maretzek, A ;
Noble, PA ;
Fletcher, M .
BIOINFORMATICS, 2001, 17 (05) :429-437
[3]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[4]  
Ewens W.J., 2001, STAT METHODS BIOINFO
[5]  
Felsenstein J., 1993, PHYLIP PHYLOGENY INF
[6]   A PROBABILISTIC DISTANCE MEASURE FOR HIDDEN MARKOV-MODELS [J].
JUANG, BH ;
RABINER, LR .
AT&T TECHNICAL JOURNAL, 1985, 64 (02) :391-408
[7]  
JUKES T H, 1969, P21
[9]  
Kroupa T., 2003, P 6 WORKSH UNC PROC, P173
[10]   Pseudo-periodic partitions of biological sequences [J].
Li, LG ;
Jin, RC ;
Kok, PL ;
Wan, HH .
BIOINFORMATICS, 2004, 20 (03) :295-306