A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications

被引:114
作者
Deng, Mo [1 ]
Yu, Chenglong [2 ]
Liang, Qian [1 ]
He, Rong L. [3 ]
Yau, Stephen S. -T. [1 ]
机构
[1] Univ Illinois, Dept Math Stat & Comp Sci, Chicago, IL 60680 USA
[2] Chinese Univ Hong Kong, Inst Math Sci, Shatin, Hong Kong, Peoples R China
[3] Chicago State Univ, Dept Biol Sci, Chicago, IL USA
基金
美国国家科学基金会;
关键词
DNA-SEQUENCES; EVOLUTION; HUMANS; EMERGENCE; VIRUSES; H1;
D O I
10.1371/journal.pone.0017293
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
Background: Most existing methods for phylogenetic analysis involve developing an evolutionary model and then using some type of computational algorithm to perform multiple sequence alignment. There are two problems with this approach: (1) different evolutionary models can lead to different results, and (2) the computation time required for multiple alignments makes it impossible to analyse the phylogeny of a whole genome. This motivates us to create a new approach to characterize genetic sequences. Methodology: To each DNA sequence, we associate a natural vector based on the distributions of nucleotides. This produces a one-to-one correspondence between the DNA sequence and its natural vector. We define the distance between two DNA sequences to be the distance between their associated natural vectors. This creates a genome space with a biological distance which makes global comparison of genomes with same topology possible. We use our proposed method to analyze the genomes of the new influenza A (H1N1) virus, human rhinoviruses (HRV) and mammalian mitochondrial. The result shows that a triple-reassortant swine virus circulating in North America and the Eurasian swine virus belong to the lineage of the influenza A (H1N1) virus. For the HRV and mammalian mitochondrial genomes, the results coincide with biologists' analyses. Conclusions: Our approach provides a powerful new tool for analyzing and annotating genomes and their phylogenetic relationships. Whole or partial genomes can be handled more easily and more quickly than using multiple alignment methods. Once a genome space has been constructed, it can be stored in a database. There is no need to reconstruct the genome space for subsequent applications, whereas in multiple alignment methods, realignment is needed to add new sequences. Furthermore, one can make a global comparison of all genomes simultaneously, which no other existing method can achieve.
引用
收藏
页数:9
相关论文
共 35 条
[1]
Informatics for unveiling hidden genome signatures [J].
Abe, T ;
Kanaya, S ;
Kinouchi, M ;
Ichiba, Y ;
Kozuki, T ;
Ikemura, T .
GENOME RESEARCH, 2003, 13 (04) :693-702
[2]
Amano K., 2003, Genome Informatics, V14, P575
[3]
[Anonymous], 1985, COMPSTAT LECT
[4]
Implications of the Emergence of a Novel H1 Influenza Virus [J].
Belshe, Robert B. .
NEW ENGLAND JOURNAL OF MEDICINE, 2009, 360 (25) :2667-2668
[5]
MITOCHONDRIAL-DNA SEQUENCES OF PRIMATES - TEMPO AND MODE OF EVOLUTION [J].
BROWN, WM ;
PRAGER, EM ;
WANG, A ;
WILSON, AC .
JOURNAL OF MOLECULAR EVOLUTION, 1982, 18 (04) :225-239
[6]
A Rapid Method for Characterization of Protein Relatedness Using Feature Vectors [J].
Carr, Kareem ;
Murray, Eleanor ;
Armah, Ebenezer ;
He, Rong L. ;
Yau, Stephen S. -T. .
PLOS ONE, 2010, 5 (03)
[7]
Feature selection for genetic sequence classification [J].
Chuzhanova, NA ;
Jones, AJ ;
Margetts, S .
BIOINFORMATICS, 1998, 14 (02) :139-143
[8]
Emergence of a Novel Swine-Origin Influenza A (H1N1) Virus in Humans Novel Swine-Origin Influenza A (H1N1) Virus Investigation Team [J].
Dawood, Fatimah S. ;
Jain, Seema ;
Finelli, Lyn ;
Shaw, Michael W. ;
Lindstrom, Stephen ;
Garten, Rebecca J. ;
Gubareva, Larisa V. ;
Xu, Xiyan ;
Bridges, Carolyn B. ;
Uyeki, Timothy M. .
NEW ENGLAND JOURNAL OF MEDICINE, 2009, 360 (25) :2605-2615
[9]
MUSCLE: a multiple sequence alignment method with reduced time and space complexity [J].
Edgar, RC .
BMC BIOINFORMATICS, 2004, 5 (1) :1-19
[10]
EMRICH SJ, 2006, HDB COMPUTATIONAL MO