CHOP proteins into structural domain-like fragments

被引:60
作者
Liu, JF
Rost, B
机构
[1] Columbia Univ, CUBIC, Dept Biochem & Mol Biophys, New York, NY USA
[2] Columbia Univ, Ctr Computat Biol & Bioinformat, New York, NY USA
[3] Columbia Univ, Dept Biochem & Mol Biophys, NESG Consortium, New York, NY USA
[4] Columbia Univ, Dept Pharmacol, New York, NY USA
关键词
genome sequence analysis; protein domains; automatic sequence clustering; protein structure; structural genomics;
D O I
10.1002/prot.20095
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We developed a method CHOP dissecting proteins into domain-like fragments. The basic idea was to cut proteins beginning from very reliable experimental information (PDB), proceeding to expert annotations of domain-like regions (Pfam-A), and completing through cuts based on termini of known proteins. In this way, CHOP dissected more than two thirds of all proteins from 62 proteomes. Analysis of our structural domain-like fragments revealed four surprising results. First, >70% of all dissected proteins contained more than one fragment. Second, most domains spanned on average over similar to100 residues. This average was similar for eukaryotic and prokaryotic proteins, and it is also valid-although previously not described-for all proteins in the PDB. Third, single-domain proteins were significant longer than most domains in multidomain proteins. Fourth, three fourths of all domains appeared shorter than 210 residues. We believe that our CHOP fragments constituted an important resource for functional and structural genom-ics. Nevertheless, our main motivation to develop CHOP was that the single-linkage clustering method failed to adequately group full-length proteins. In contrast, CLUP-the simple clustering scheme CLUP introduced here-succeeded largely to group the CHOP fragments from 62 proteomes such that all members of one cluster shared a basic structural core. CLUP found >63,000 multi-and >118,000 single-member clusters. Although most fragments were restricted to a particular cluster, similar to24% of the fragments were duplicated in at least two clusters. Our thresholds for grouping two fragments into the same cluster were rather conservative. Nevertheless, our results suggested that structural genomics initiatives have to target >30,000 fragments to at least cover the multi-member clusters in 62 proteomes. (C) 2004Wiley-Liss,Inc.
引用
收藏
页码:678 / 688
页数:11
相关论文
共 100 条
[1]   Clustering of proximal sequence space for the identification of protein families [J].
Abascal, F ;
Valencia, A .
BIOINFORMATICS, 2002, 18 (07) :908-921
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Modularity and homology: Modelling of the titin type I modules and their interfaces [J].
Amodeo, P ;
Fraternali, F ;
Lesk, AM ;
Pastore, A .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 311 (02) :283-296
[4]   Domain combinations in archaeal, eubacterial and eukaryotic proteomes [J].
Apic, G ;
Gough, J ;
Teichmann, SA .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 310 (02) :311-325
[5]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[6]   PROTEIN MODULES [J].
BARON, M ;
NORMAN, DG ;
CAMPBELL, ID .
TRENDS IN BIOCHEMICAL SCIENCES, 1991, 16 (01) :13-17
[7]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[8]   3D DOMAIN SWAPPING - A MECHANISM FOR OLIGOMER ASSEMBLY [J].
BENNETT, MJ ;
SCHLUNEGGER, MP ;
EISENBERG, D .
PROTEIN SCIENCE, 1995, 4 (12) :2455-2468
[9]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[10]   CATCHING A COMMON FOLD [J].
BLUNDELL, TL ;
JOHNSON, MS .
PROTEIN SCIENCE, 1993, 2 (06) :877-883