How independent are the appearances of n-mers in different genomes?

被引:53
作者
Fofanov, Y
Luo, Y
Katili, C
Wang, J
Belosludtsev, Y
Powdrill, T
Belapurkar, C
Fofanov, V
Li, TB
Chumakov, S
Pettitt, BM
机构
[1] Univ Houston, Dept Comp Sci, Houston, TX 77204 USA
[2] Univ Houston, Dept Chem, Houston, TX 77204 USA
[3] Vitruvius Biosci, The Woodlands, TX USA
[4] Univ Guadalajara, Dept Phys, Guadalajara 44430, Jalisco, Mexico
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/bth266
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Analysis of statistical properties of DNA sequences is important for evolutional biology as well as for DNA probe and PCR technologies. These technologies, in turn, can be used for organism identification, which implies applications in the diagnosis of infectious diseases, environmental studies, etc. Results: We present results of the correlation analysis of distributions of the presence/absence of short nucleotide subsequences of different length ('n-mers', n = 5 - 20) in more than 1500 microbial and virus genomes, together with five genomes of multicellular organisms (including human). We calculate whether a given n-mer is present or absent (frequency of presence) in a given genome, which is not the usually calculated number of appearances of n-mers in one or more genomes (frequency of appearance). For organisms that are not close relatives of each other, the presence/absence of different 7-20mers in their genomes are not correlated. For close biological relatives, some correlation of the presence of n-mers in this range appears, but is not as strong as expected. Suppressed correlations among the n-mers present in different genomes leads to the possibility of using random sets of n-mers (with appropriately chosen n) to discriminate genomes of different organisms and possibly individual genomes of the same species including human with a low probability of error.
引用
收藏
页码:2421 / 2428
页数:8
相关论文
共 16 条
[1]   Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA [J].
Campbell, A ;
Mrázek, J ;
Karlin, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (16) :9184-9189
[2]   Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences [J].
Deschavanne, PJ ;
Giron, A ;
Vilain, J ;
Fagot, G ;
Fertil, B .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (10) :1391-1399
[3]   Differential display approach to quantitation of environmental stimuli on bacterial gene expression [J].
Fislage, R .
ELECTROPHORESIS, 1998, 19 (04) :613-616
[4]   Primer design for a prokaryotic differential display RT-PCR [J].
Fislage, R ;
Berceanu, M ;
Humboldt, Y ;
Wendt, M ;
Oberender, H .
NUCLEIC ACIDS RESEARCH, 1997, 25 (09) :1830-1835
[5]  
FOFANOV V, 2002, 2002 BIOINF S KECK G, P14
[6]  
FOFANOV V, 2002, 7 STRUCT BIOL S SEAL, P51
[7]   COMPARISONS OF EUKARYOTIC GENOMIC SEQUENCES [J].
KARLIN, S ;
LADUNGA, I .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (26) :12832-12836
[8]   Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes [J].
Karlin, S .
TRENDS IN MICROBIOLOGY, 2001, 9 (07) :335-343
[9]   Compositional biases of bacterial genomes and evolutionary implications [J].
Karlin, S ;
Mrazek, J ;
Campbell, AM .
JOURNAL OF BACTERIOLOGY, 1997, 179 (12) :3899-3913
[10]  
Nakashima H, 1998, DNA Res, V5, P251, DOI 10.1093/dnares/5.5.251