Alignment-free comparison of genome sequences by a new numerical characterization

被引:35
作者
Huang, Guohua [1 ]
Zhou, Houqing [1 ]
Li, Yongfan [2 ]
Xu, Lixin [1 ]
机构
[1] Shaoyang Univ, Dept Math, Shaoyang 422000, Hunan, Peoples R China
[2] Hunan First Normal Coll, Changsha 410002, Hunan, Peoples R China
关键词
Alignment-free comparison; Graphical representation; DNA sequence; Numerical characterization; Phylogenetic tree; 2D GRAPHICAL REPRESENTATION; FEATURE FREQUENCY PROFILES; DNA-SEQUENCES; PROTEIN SEQUENCES; SIMILARITY; PHYLOGENY; DISSIMILARITY;
D O I
10.1016/j.jtbi.2011.04.003
中图分类号
Q [生物科学];
学科分类号
090105 [作物生产系统与生态工程];
摘要
In order to compare different genome sequences, an alignment-free method has proposed. First, we presented a new graphical representation of DNA sequences without degeneracy, which is conducive to intuitive comparison of sequences. Then, a new numerical characterization based on the representation was introduced to quantitatively depict the intrinsic nature of genome sequences, and considered as a 10-dimensional vector in the mathematical space. Alignment-free comparison of sequences was performed by computing the distances between vectors of the corresponding numerical characterizations, which define the evolutionary relationship. Two data sets of DNA sequences were constructed to assess the performance on sequence comparison. The results illustrate well validity of the method. The new numerical characterization provides a powerful tool for genome comparison. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:107 / 112
页数:6
相关论文
共 28 条
[2]
3D graphical representation of protein sequences and their statistical characterization [J].
el Maaty, Moheb I. Abo ;
Abo-Elkhier, Mervat M. ;
Abd Elwahaab, Marwa A. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2010, 389 (21) :4668-4676
[3]
H-L curve:: A novel 2D graphical representation for DNA sequences [J].
Huang, Guohua ;
Liao, Bo ;
Li, Yongfan ;
Liu, Zanbo .
CHEMICAL PHYSICS LETTERS, 2008, 462 (1-3) :129-132
[4]
Similarity studies of DNA sequences based on a new 2D graphical representation [J].
Huang, Guohua ;
Liao, Bo ;
Li, Yongfan ;
Yu, Yougui .
BIOPHYSICAL CHEMISTRY, 2009, 143 (1-2) :55-59
[5]
Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution [J].
Jun, Se-Ran ;
Sims, Gregory E. ;
Wu, Guohong A. ;
Kim, Sung-Hou .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (01) :133-138
[6]
A statistical method for alignment-free comparison of regulatory sequences [J].
Kantorovitz, Miriam R. ;
Robinson, Gene E. ;
Sinha, Saurabh .
BIOINFORMATICS, 2007, 23 (13) :I249-I255
[7]
Korf IF, 2009, METHODS MOL BIOL, V553, P287, DOI 10.1007/978-1-60327-563-7_14
[8]
New 2D graphical representation of DNA sequences [J].
Liao, B ;
Wang, TM .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2004, 25 (11) :1364-1368
[9]
A 3D graphical representation of DNA sequences and its application [J].
Liao, Bo ;
Ding, Kequan .
THEORETICAL COMPUTER SCIENCE, 2006, 358 (01) :56-64
[10]
Distributional regimes for the number of k-word matches between two random sequences [J].
Lippert, RA ;
Huang, HY ;
Waterman, MS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (22) :13980-13989