Combining sensitive database searches with multiple intermediates to detect distant homologues

被引:40
作者
Salamov, AA
Suwa, M
Orengo, CA
Swindells, MB
机构
[1] Helix Res Inst, Kisarazu, Chiba 292, Japan
[2] UCL, Dept Biochem, London, England
[3] Univ Tsukuba, Tsukuba Adv Res Alliance, Tsukuba, Ibaraki 305, Japan
来源
PROTEIN ENGINEERING | 1999年 / 12卷 / 02期
关键词
CATH; intermediate searches; sequence analysis; protein structure;
D O I
10.1093/protein/12.2.95
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Using data from the CATH structure classification, we have assessed the blastp, fasta, smith-waterman and gapped-blast algorithms, developed a portable normalization scheme and identified safe thresholds for database searching. Of the four methods assessed, fasta, smith-waterman and gapped-blast perform similarly, whereas the sensitivity of blastp was much lower. Introduction of an intermediate sequence search substantially improved the results. When tested on a set of relationships that could not be identified by blastp, intermediate sequences were able to find double the number of relationships identified by the smith-waterman algorithm alone. However, we found that the benefit of using intermediates varied considerably between each family and depended not only on the number of available sequences, but also their diversity. In an attempt to increase sensitivity further, a multiple intermediate sequence search (MISS) procedure was developed. When assessed on 1906 cases from a wide range of homologous families that could not be detected by the previous approaches, MISS was able to identify 241 additional relationships. MISS uses the full extent of sequence diversity to detect additional relationships, but does not consider any structure-specific information. For this reason, it is more generally applicable than fold recognition and threading methods, which require a library of known structures.
引用
收藏
页码:95 / 100
页数:6
相关论文
共 30 条
[21]   A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].
NEEDLEMAN, SB ;
WUNSCH, CD .
JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+
[22]   CATH - a hierarchic classification of protein domain structures [J].
Orengo, CA ;
Michie, AD ;
Jones, S ;
Jones, DT ;
Swindells, MB ;
Thornton, JM .
STRUCTURE, 1997, 5 (08) :1093-1108
[23]   Intermediate sequences increase the detection of homology between sequences [J].
Park, J ;
Teichmann, SA ;
Hubbard, T ;
Chothia, C .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 273 (01) :349-354
[24]   COMPARISON OF METHODS FOR SEARCHING PROTEIN-SEQUENCE DATABASES [J].
PEARSON, WR .
PROTEIN SCIENCE, 1995, 4 (06) :1145-1160
[25]  
Pearson WR, 1996, METHOD ENZYMOL, V266, P227
[26]   IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON [J].
PEARSON, WR ;
LIPMAN, DJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) :2444-2448
[27]   DATABASE OF HOMOLOGY-DERIVED PROTEIN STRUCTURES AND THE STRUCTURAL MEANING OF SEQUENCE ALIGNMENT [J].
SANDER, C ;
SCHNEIDER, R .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1991, 9 (01) :56-68
[28]   Sensitivity and selectivity in protein similarity searches: A comparison of Smith-Waterman in hardware to BLAST and FASTA [J].
Shpaer, EG ;
Robinson, M ;
Yee, D ;
Candlin, JD ;
Mines, R ;
Hunkapiller, T .
GENOMICS, 1996, 38 (02) :179-191
[29]   IDENTIFICATION OF COMMON MOLECULAR SUBSEQUENCES [J].
SMITH, TF ;
WATERMAN, MS .
JOURNAL OF MOLECULAR BIOLOGY, 1981, 147 (01) :195-197
[30]   Pfam: multiple sequence alignments and HMM-profiles of protein domains [J].
Sonnhammer, ELL ;
Eddy, SR ;
Birney, E ;
Bateman, A ;
Durbin, R .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :320-322