Structural characterization of genomes by large scale sequence-structure threading

被引:6
作者
Cherkasov, A [1 ]
Jones, SJM
机构
[1] British Columbia Canc Agcy, Genome Sci Ctr, Vancouver, BC V5Z 4E6, Canada
[2] Univ British Columbia, Fac Med, Vancouver, BC, Canada
关键词
D O I
10.1186/1471-2105-5-37
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Using sequence-structure threading we have conducted structural characterization of complete proteomes of 37 archaeal, bacterial and eukaryotic organisms ( including worm, fly, mouse and human) totaling 167,888 genes. Results: The reported data represent first rather general evaluation of performance of full sequence-structure threading on multiple genomes providing opportunity to evaluate its general applicability for large scale studies. According to the estimated results the sequence-structure threading has assigned protein folds to more then 60% of eukaryotic, 68% of archaeal and 70% of bacterial proteomes. The repertoires of protein classes, architectures, topologies and homologous superfamilies ( according to the CATH 2.4 classification) have been established for distant organisms and superkingdoms. It has been found that the average abundance of CATH classes decreases from "alpha and beta" to "mainly beta", followed by "mainly alpha" and "few secondary structures". 3-Layer (aba) Sandwich has been characterized as the most abundant protein architecture and Rossman fold as the most common topology. Conclusion: The analysis of genomic occurrences of CATH 2.4 protein homologous superfamilies and topologies has revealed the power-law character of their distributions. The corresponding double logarithmic "frequency-genomic occurrence" dependences characteristic of scale-free systems have been established for individual organisms and for three superkingdoms.
引用
收藏
页数:16
相关论文
共 48 条
[1]  
Apic Gordana, 2003, Journal of Structural and Functional Genomics, V4, P67, DOI 10.1023/A:1026113408773
[2]  
BATES A, 1996, GENOMES MOL BIOL DRU
[3]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]   A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE [J].
BOWIE, JU ;
LUTHY, R ;
EISENBERG, D .
SCIENCE, 1991, 253 (5016) :164-170
[5]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[6]   Evidence that plant-like genes in Chlamydia species reflect an ancestral relationship between Chlamydiaceae, cyanobacteria, and the chloroplast [J].
Brinkman, FSL ;
Blanchard, JL ;
Cherkasov, A ;
Av-Gay, Y ;
Brunham, RC ;
Fernandez, RC ;
Finlay, BB ;
Otto, SP ;
Ouellette, BFF ;
Keeling, PJ ;
Rose, AM ;
Hancock, REW ;
Jones, SJM .
GENOME RESEARCH, 2002, 12 (08) :1159-1167
[7]   STATISTICS OF SEQUENCE-STRUCTURE THREADING [J].
BRYANT, SH ;
ALTSCHUL, SF .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1995, 5 (02) :236-244
[8]   Multiple sequence information for threading algorithms [J].
Defay, TR ;
Cohen, FE .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 262 (02) :314-323
[9]   Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium [J].
Fischer, D ;
Eisenberg, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1997, 94 (22) :11929-11934
[10]   A structural census of the current population of protein sequences [J].
Gerstein, M ;
Levitt, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1997, 94 (22) :11911-11916