Armadillo: Domain boundary prediction by amino acid composition

被引:52
作者
Dumontier, M
Yao, R
Feldman, HJ
Hogue, CWV
机构
[1] Univ Toronto, Dept Biochem, Toronto, ON M5S 1A8, Canada
[2] Mt Sinai Hosp, Samuel Lunenfeld Res Inst, Toronto, ON M5G 1X5, Canada
基金
加拿大健康研究院; 加拿大自然科学与工程研究理事会;
关键词
domain; linker; boundary; prediction; amino acid composition;
D O I
10.1016/j.jmb.2005.05.037
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The identification and annotation of protein domains provides a critical step in the accurate determination of molecular function. Both computational and experimental methods of protein structure determination may be deterred by large multi-domain proteins or flexible linker regions. Knowledge of domains and their boundaries may reduce the experimental cost of protein structure determination by allowing researchers to work on a set of smaller and possibly more successful alternatives. Current domain prediction methods often rely on sequence similarity to conserved domains and as such are poorly suited to detect domain structure in poorly conserved or orphan proteins. We present here a simple computational method to identify protein domain linkers and their boundaries from sequence information alone. Our domain predictor, Armadillo (http://armadillo.blueprint.org), uses any amino acid index to convert a protein sequence to a smoothed numeric profile from which domains and domain boundaries may be predicted. We derived an amino acid index called the domain linker propensity index (DLI) from the amino acid composition of domain linkers using a non-redundant structure dataset. The index indicates that Pro and Gly show a propensity for linker residues while small hydrophobic residues do not. Armadillo predicts domain linker boundaries from Z-score distributions and obtains 35% sensitivity with DLI in a two-domain, single-linker dataset (within +/- 20 residues from linker). The combination of DLI and an entropy-based amino acid index increases the overall Armadillo sensitivity to 56% for two domain proteins. Moreover, Armadillo achieves 37% sensitivity for multi-domain proteins, surpassing most other prediction methods. Armadillo provides a simple, but effective method by which prediction of domain boundaries can be obtained with reasonable sensitivity. Armadillo should prove to be a valuable tool for rapidly delineating protein domains in poorly conserved proteins or those with no sequence neighbors. As a first-line predictor, domain meta-predictors could yield improved results with Armadillo predictions. (c) 2005 Published by Elsevier Ltd.
引用
收藏
页码:1061 / 1073
页数:13
相关论文
共 54 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   AN INVESTIGATION OF OLIGOPEPTIDES LINKING DOMAINS IN PROTEIN TERTIARY STRUCTURES AND POSSIBLE CANDIDATES FOR GENERAL GENE FUSION [J].
ARGOS, P .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 211 (04) :943-958
[3]   HELIX GEOMETRY IN PROTEINS [J].
BARLOW, DJ ;
THORNTON, JM .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 201 (03) :601-619
[4]   Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation [J].
Bradley, P ;
Chivian, D ;
Meiler, J ;
Misura, KMS ;
Rohl, CA ;
Schief, WR ;
Wedemeyer, WJ ;
Schueler-Furman, O ;
Murphy, P ;
Schonbrun, J ;
Strauss, CEM ;
Baker, D .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 :457-468
[5]   THE PREDICTION OF PROTEIN DOMAINS [J].
BUSETTA, B ;
BARRANS, Y .
BIOCHIMICA ET BIOPHYSICA ACTA, 1984, 790 (02) :117-124
[6]   Automated prediction of CASP-5 structures using the Robetta server [J].
Chivian, D ;
Kim, DE ;
Malmström, L ;
Bradley, P ;
Robertson, T ;
Murphy, P ;
Strauss, CEM ;
Bonneau, R ;
Rohl, CA ;
Baker, D .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 :524-533
[7]   Structure and dynamics of the human pleckstrin DEP domain: Distinct molecular features of a novel DEP domain subfamily [J].
Civera, C ;
Simon, B ;
Stier, G ;
Sattler, M ;
Macias, MJ .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 58 (02) :354-366
[8]   Domain Fishing: a first step in protein comparative modelling [J].
Contreras-Moreira, B ;
Bates, PA .
BIOINFORMATICS, 2002, 18 (08) :1141-1142
[9]   Probing the domain structure and ligand-induced conformational changes by limited proteolysis of tyrocidine synthetase 1 [J].
Dieckmann, R ;
Pavela-Vrancic, M ;
von Döhren, H ;
Kleinkauf, H .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 288 (01) :129-140
[10]   Species-specific protein sequence and fold optimizations [J].
Dumontier, M ;
Michalickova, K ;
Hogue, CWV .
BMC BIOINFORMATICS, 2002, 3 (1)