Protein domain identification and improved sequence similarity searching using PSI-BLAST

被引:48
作者
George, RA [1 ]
Heringa, J [1 ]
机构
[1] Natl Inst Med Res, Div Math Biol, London NW7 1AA, England
关键词
domains; modules; PSI-BLAST; sequence; genome;
D O I
10.1002/prot.10175
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein sequences containing more than one structural domain are problematic when used in homology searches where they can either stop an iterative database search prematurely or cause an explosion of a search to common domains. We describe a method, DOMAINATION, that infers domains and their boundaries in a query sequence from local gapped alignments generated using PSI-BLAST. Through a new technique to recognize domain insertions and permutations, DOMAINATION submits delineated domains as successive database queries in further iterative steps. Assessed over a set of 452 multidomain proteins, the method predicts structural domain boundaries with an overall accuracy of 50% and improves finding distant homologies by 14% compared with PSI-BLAST. DOMAINATION is available as a web based tool at http://mathbio.nimr.mrc.ac.uk, and the source code is available from the authors upon request. Proteins 2002;48:672-681. (C) 2002WiIey-Liss,lnc.
引用
收藏
页码:672 / 681
页数:10
相关论文
共 48 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[4]   Searching databases to find protein domain organization [J].
Bateman, A ;
Birney, E .
ADVANCES IN PROTEIN CHEMISTRY, VOL 54: ANALYSIS OF AMINO ACID SEQUENCES, 2000, 54 :137-157
[5]  
Bonneau R, 2001, PROTEINS, V43, P1, DOI 10.1002/1097-0134(20010401)43:1<1::AID-PROT1012>3.0.CO
[6]  
2-A
[7]   SHUFFLED DOMAINS IN EXTRACELLULAR PROTEINS [J].
BORK, P .
FEBS LETTERS, 1991, 286 (1-2) :47-54
[8]   THE PREDICTION OF PROTEIN DOMAINS [J].
BUSETTA, B ;
BARRANS, Y .
BIOCHIMICA ET BIOPHYSICA ACTA, 1984, 790 (02) :117-124
[9]   ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons [J].
Corpet, F ;
Servant, F ;
Gouzy, J ;
Kahn, D .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :267-269
[10]   THE MULTIPLICITY OF DOMAINS IN PROTEINS [J].
DOOLITTLE, RF .
ANNUAL REVIEW OF BIOCHEMISTRY, 1995, 64 :287-314