Quantifying the species-specificity in genomics signatures, synonymous codon choice, amino acid usage and G+C content

被引:41
作者
Sandberg, R [1 ]
Bränden, CI
Ernberg, I
Cöster, J
机构
[1] Karolinska Inst, Ctr Microbiol & Tumor Biol, S-17177 Stockholm, Sweden
[2] Virtual Genet Lab AB, S-17177 Stockholm, Sweden
关键词
C; cytosine; G; guanine; RSCU; relative synonymous codon usage; NBC; naive Bayesian classifier; AU; arbitrary units;
D O I
10.1016/S0378-1119(03)00581-X
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Each prokaryote has a unique genomic signature as evidenced by a set of species-specific frequencies of short oligonucleotides. With respect to genomic signatures a bacterial genome is homogenous and the variation within a genome is smaller than the variations between genomes of different species. This study quantifies the species-specificity of genomic signatures in the complete genomes of 57 prokaryotes. The species-specificity in the genomic signature was related to the quantification of other sequence biases, such as G + C content, synonymous codon choice and amino acid usage. The results confirm that the genomic signature is genome-wide with high species-specificity in both coding and non-coding regions. In coding regions the species-specific bias in synonymous codon choice was comparable to the genomic signature, while the bias in amino acid usage only captured about 50% of the species-specific bias in the genomic signature. A correlation between the species-specificity in synonymous codon choice and amino acid usage was identified, in which proteins with species-specific amino acid usage were also coded with species-specific synonymous codon choice. However, we demonstrated that the G + C content captures only approximately 40% of the species-specificity in the genomic signature, and is insufficient to explain the species specificity in the non-coding regions. Thus, the species-specific bias in non-coding regions remains largely unknown. Further, we compared the genomic signature in relation to phylogenetic distance. This was performed in order to illustrate the feasibility of a hierarchical classification scheme in future applications of the described classification methodology in screening for horizontal gene transfer and biodiversity studies. (C) 2003 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:35 / 42
页数:8
相关论文
共 22 条
[1]   A MARKOV ANALYSIS OF DNA-SEQUENCES [J].
ALMAGOR, H .
JOURNAL OF THEORETICAL BIOLOGY, 1983, 104 (04) :633-645
[2]   Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences [J].
Deschavanne, PJ ;
Giron, A ;
Vilain, J ;
Fagot, G ;
Fertil, B .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (10) :1391-1399
[3]   Different biological species ''broadcast'' their DNAs at different (G+C)% ''wavelengths'' [J].
Forsdyke, DR .
JOURNAL OF THEORETICAL BIOLOGY, 1996, 178 (04) :405-417
[4]   Chargaff's legacy [J].
Forsdyke, DR ;
Mortimer, JR .
GENE, 2000, 261 (01) :127-137
[5]   Genome-scale compositional comparisons in eukaryotes [J].
Gentles, AJ ;
Karlin, S .
GENOME RESEARCH, 2001, 11 (04) :540-546
[6]   NUCLEOTIDE, DINUCLEOTIDE AND TRINUCLEOTIDE FREQUENCIES EXPLAIN PATTERNS OBSERVED IN CHAOS GAME REPRESENTATIONS OF DNA-SEQUENCES [J].
GOLDMAN, N .
NUCLEIC ACIDS RESEARCH, 1993, 21 (10) :2487-2491
[7]  
Good I. J., 1965, ESTIMATION PROBABILI
[8]  
GRANTHAM R, 1980, NUCLEIC ACIDS RES, V8, P49
[9]   STATISTICAL-ANALYSES OF COUNTS AND DISTRIBUTIONS OF RESTRICTION SITES IN DNA-SEQUENCES [J].
KARLIN, S ;
BURGE, C ;
CAMPBELL, AM .
NUCLEIC ACIDS RESEARCH, 1992, 20 (06) :1363-1370
[10]   COMPARISONS OF EUKARYOTIC GENOMIC SEQUENCES [J].
KARLIN, S ;
LADUNGA, I .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (26) :12832-12836