Comparative mapping of sequence-based and structure-based protein domains

被引:20
作者
Zhang, Y [1 ]
Chandonia, JM
Ding, C
Holbrook, SR
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Lab, Phys Biosci Div, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Lawrence Berkeley Lab, Computat Res Div, Berkeley, CA 94720 USA
[3] Penn State Univ, Sch Informat Sci & Technol, University Pk, PA 16802 USA
关键词
D O I
10.1186/1471-2105-6-77
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Protein domains have long been an ill-defined concept in biology. They are generally described as autonomous folding units with evolutionary and functional independence. Both structure-based and sequence-based domain definitions have been widely used. But whether these types of models alone can capture all essential features of domains is still an open question. Methods: Here we provide insight on domain definitions through comparative mapping of two domain classification databases, one sequence-based (Pfam) and the other structure-based ( SCOP). A mapping score is defined to indicate the significance of the mapping, and the properties of the mapping matrices are studied. Results: The mapping results show a general agreement between the two databases, as well as many interesting areas of disagreement. In the cases of disagreement, the functional and evolutionary characteristics of the domains are examined to determine which domain definition is biologically more informative.
引用
收藏
页数:16
相关论文
共 23 条
[1]   Automatic annotation of protein function based on family identification [J].
Abascal, F ;
Valencia, A .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 (03) :683-692
[2]   Domain insertions in protein structures [J].
Aroul-Selvam, R ;
Hubbard, T ;
Sasidharan, R .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 338 (04) :633-641
[3]   Protein structure prediction and structural genomics [J].
Baker, D ;
Sali, A .
SCIENCE, 2001, 294 (5540) :93-96
[4]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[5]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[6]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[7]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[8]   The ASTRAL Compendium in 2004 [J].
Chandonia, JM ;
Hon, G ;
Walker, NS ;
Lo Conte, L ;
Koehl, P ;
Levitt, M ;
Brenner, SE .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D189-D192
[9]  
Delano WL., 2002, The PyMOL Molecular Graphics System
[10]   A comparison of sequence and structure protein domain families as a basis for structural genomics [J].
Elofsson, A ;
Sonnhammer, ELL .
BIOINFORMATICS, 1999, 15 (06) :480-500