Sensitivity and selectivity in protein structure comparison

被引:81
作者
Sierk, ML [1 ]
Pearson, WR [1 ]
机构
[1] Univ Virginia, Hlth Syst, Dept Biochem & Mol Genet, Charlottesville, VA 22908 USA
关键词
structure alignment; database search; statistical significance; CATH database;
D O I
10.1110/ps.03328504
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Seven protein structure comparison methods and two sequence comparison programs were evaluated on their ability to detect either protein homologs or domains with the same topology (fold) as defined by the CATH structure database. The structure alignment programs Dali, Structal, Combinatorial Extension (CE), VAST, and Matras were tested along with SGM and PRIDE, which calculate a structural distance between two domains without aligning, them. We also tested two sequence alignment programs, SSEARCH and PSI-BLAST. Depending upon the level of selectivity and error model, structure alignment programs can detect roughly twice as many homologous domains in CATH as sequence alignment programs. Dali finds the most homologs, 321-533 of 1120 possible true positives (28.7%-45.7%), at an error rate of 0.1 errors per query (EPQ), whereas PSI-BLAST finds 365 true positives (32.6%), regardless of the error model. At an EPQ of 1.0. Dali finds 42%-70% of possible homologs, whereas Matras finds 49%-57%; PSI-BLAST finds 36.9%. However. Dali achieves >84% coverage before the first error for half of the families tested. Dali and PSI-BLAST find 9.2% and 5.2%, respectively, of the 7056 possible topology pairs at an EPQ of 0.1 and 19.5, and 5.9% at an EPQ of 1.0. Most statistical significance estimates reported by the structural alignment programs overestimate the significance of an alignment by orders of magnitude when compared with the actual distribution of errors. These results help quantify the statistical distinction between analogous and homologous structures, and provide a benchmark for structure comparison statistics.
引用
收藏
页码:773 / 785
页数:13
相关论文
共 42 条
[1]  
Altschul SF, 1996, METHOD ENZYMOL, V266, P460
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[5]  
Brenner SE, 2000, PROTEIN SCI, V9, P197
[6]   Protein fold similarity estimated by a probabilistic approach based on Cα-Cα distance comparison [J].
Carugo, O ;
Pongor, S .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 315 (04) :887-898
[7]   A normalized root-mean-square distance for comparing protein three-dimensional structures [J].
Carugo, O ;
Pongor, S .
PROTEIN SCIENCE, 2001, 10 (07) :1470-1473
[8]   Structure comparison and structure patterns [J].
Eidhammer, I ;
Jonassen, I ;
Taylor, WR .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (05) :685-716
[9]   Surprising similarities in structure comparison [J].
Gibrat, JF ;
Madej, T ;
Bryant, SH .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) :377-385
[10]   Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching [J].
Gribskov, M ;
Robinson, NL .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :25-33