Whole genome protein domain analysis using a new method for domain clustering

被引:49
作者
Gouzy, J
Corpet, F
Kahn, D
机构
[1] INRA, CNRS, Lab Biol Mol Relat Plantes Microorganismes, F-31326 Castanet Tolosan, France
[2] INRA, Lab Genet Cellulaire, F-31326 Castanet Tolosan, France
来源
COMPUTERS & CHEMISTRY | 1999年 / 23卷 / 3-4期
关键词
D O I
10.1016/S0097-8485(99)00011-X
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We present the outcome of a systematic analysis of protein domain shuffling in 17 completed microbial genomes. This analysis has been performed using MKDOM Version 2, a completely new version of the domain clustering program MKDOM based on PSI-BLAST recursive homology searches. It allows to delineate the most frequent protein domain building blocks, which domains are found specifically in Bacteria, Archaea or yeast, and which domains are shared between two or all three domains of life. The latter are good candidates as the basic protein building blocks underlying all forms of cellular life. Statistics of multi-domain proteins indicate that some organisms such as Bacillus subtilis or Mycobacterium tuberculosis contain an abnormally high number of large multi-domain proteins. We also provide examples of highly shuffled or circularly permutated domains. A WWW graphical interface has been made available to interactively browse domain arrangements of proteins in all 17 genomes, at http://www.toulouse.inra.fr/prodomCG.html. (C) 1999 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:333 / 340
页数:8
相关论文
共 14 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence [J].
Cole, ST ;
Brosch, R ;
Parkhill, J ;
Garnier, T ;
Churcher, C ;
Harris, D ;
Gordon, SV ;
Eiglmeier, K ;
Gas, S ;
Barry, CE ;
Tekaia, F ;
Badcock, K ;
Basham, D ;
Brown, D ;
Chillingworth, T ;
Connor, R ;
Davies, R ;
Devlin, K ;
Feltwell, T ;
Gentles, S ;
Hamlin, N ;
Holroyd, S ;
Hornby, T ;
Jagels, K ;
Krogh, A ;
McLean, J ;
Moule, S ;
Murphy, L ;
Oliver, K ;
Osborne, J ;
Quail, MA ;
Rajandream, MA ;
Rogers, J ;
Rutter, S ;
Seeger, K ;
Skelton, J ;
Squares, R ;
Squares, S ;
Sulston, JE ;
Taylor, K ;
Whitehead, S ;
Barrell, BG .
NATURE, 1998, 393 (6685) :537-+
[4]   Recent improvements of the ProDom database of protein domain families [J].
Corpet, F ;
Gouzy, J ;
Kahn, D .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :263-267
[5]   MULTIPLE SEQUENCE ALIGNMENT WITH HIERARCHICAL-CLUSTERING [J].
CORPET, F .
NUCLEIC ACIDS RESEARCH, 1988, 16 (22) :10881-10890
[6]   The complete genome of the hyperthermophilic bacterium Aquifex aeolicus [J].
Deckert, G ;
Warren, PV ;
Gaasterland, T ;
Young, WG ;
Lenox, AL ;
Graham, DE ;
Overbeek, R ;
Snead, MA ;
Keller, M ;
Aujay, M ;
Huber, R ;
Feldman, RA ;
Short, JM ;
Olsen, GJ ;
Swanson, RV .
NATURE, 1998, 392 (6674) :353-358
[7]  
Gouzy J, 1997, COMPUT APPL BIOSCI, V13, P601
[8]   Automated protein sequence database classification. II. Delineation of domain boundaries from sequence similarities [J].
Gracy, J ;
Argos, P .
BIOINFORMATICS, 1998, 14 (02) :174-187
[9]   DIVCLUS: an automatic method in the GEANFAMMER package that finds homologous domains in single- and multi-domain proteins [J].
Park, J ;
Teichmann, SA .
BIOINFORMATICS, 1998, 14 (02) :144-150
[10]   A REPEATING AMINO-ACID MOTIF IN CDC23 DEFINES A FAMILY OF PROTEINS AND A NEW RELATIONSHIP AMONG GENES REQUIRED FOR MITOSIS AND RNA-SYNTHESIS [J].
SIKORSKI, RS ;
BOGUSKI, MS ;
GOEBL, M ;
HIETER, P .
CELL, 1990, 60 (02) :307-317