SnapDRAGON: a method to delineate protein structural domains from sequence data

被引:68
作者
George, RA [1 ]
Heringa, J [1 ]
机构
[1] Natl Inst Med Res, Div Math Biol, London NW7 1AA, England
基金
英国医学研究理事会;
关键词
protein; domain; boundaries; prediction; folding;
D O I
10.1006/jmbi.2001.5387
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a method to identify protein domain boundaries from sequence information alone based on the assumption that hydrophobic residues cluster together in space. SnapDRAGON is a suite of programs developed to predict domain boundaries based on the consistency observed in a set of alternative ab initio three-dimensional (3D) models generated for a given protein multiple sequence alignment. This is achieved by running a distance geometry-based folding technique in conjunction with a 3D-domain assignment algorithm. The overall accuracy of our method in predicting the number of domains for a non-redundant data set of 414 multiple alignments, representing 185 single and 231 multiple-domain proteins, is 72.4%. Using domain linker regions observed in the tertiary structures associated with each query alignment as the standard of truth, inter-domain boundary positions are delineated with an accuracy of 63.9% for proteins comprising continuous domains only, and 35.4% for proteins with discontinuous domains. Overall, domain boundaries are delineated with an accuracy of 51.8%. The prediction accuracy values are independent of the pair-wise sequence similarities within each of the alignments. These results demonstrate the capability of our method to delineate domains in protein sequences associated with a wide variety of structural domain organisation. (C) 2002 Elsevier Science Ltd.
引用
收藏
页码:839 / 851
页数:13
相关论文
共 69 条