PHOG-BLAST - a new generation tool for fast similarity search of protein families

被引:8
作者
Merkeev, Igor V.
Mironov, Andrey A.
机构
[1] State Sci Ctr GosNIIGenet, Moscow 113545, Russia
[2] Moscow MV Lomonosov State Univ, Dept Bioengn & Bioinformat, Moscow 119992, Russia
关键词
D O I
10.1186/1471-2148-6-51
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The need to compare protein profiles frequently arises in various protein research areas: comparison of protein families, domain searches, resolution of orthology and paralogy. The existing fast algorithms can only compare a protein sequence with a protein sequence and a profile with a sequence. Algorithms to compare profiles use dynamic programming and complex scoring functions. Results: We developed a new algorithm called PHOG-BLAST for fast similarity search of profiles. This algorithm uses profile discretization to convert a profile to a finite alphabet and utilizes hashing for fast search. To determine the optimal alphabet, we analyzed columns in reliable multiple alignments and obtained column clusters in the 20-dimensional profile space by applying a special clustering procedure. We show that the clustering procedure works best if its parameters are chosen so that 20 profile clusters are obtained which can be interpreted as ancestral amino acid residues. With these clusters, only less than 2% of columns in multiple alignments are out of clusters. We tested the performance of PHOG-BLAST vs. PSI-BLAST on three well-known databases of multiple alignments: COG, PFAM and BALIBASE. On the COG database both algorithms showed the same performance, on PFAM and BALIBASE PHOG-BLAST was much superior to PSI-BLAST. PHOG-BLAST required 10-20 times less computer memory and computation time than PSI-BLAST. Conclusion: Since PHOG-BLAST can compare multiple alignments of protein families, it can be used in different areas of comparative proteomics and protein evolution. For example, PHOG-BLAST helped to build the PHOG database of phylogenetic orthologous groups. An essential step in building this database was comparing protein complements of different species and orthologous groups of different taxons on a personal computer in reasonable time. When it is applied to detect weak similarity between protein families, PHOG-BLAST is less precise than rigorous profile-profile comparison method, though it runs much faster and can be used as a hit pre-selecting tool.
引用
收藏
页数:9
相关论文
共 33 条
[11]  
HOLM L, 1993, NUCLEIC ACIDS RES, V26, P316
[12]   Recent improvements to the PROSITE database [J].
Hulo, N ;
Sigrist, CJA ;
Le Saux, V ;
Langendijk-Genevaux, PS ;
Bordoli, L ;
Gattiker, A ;
De Castro, E ;
Bucher, P ;
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D134-D137
[13]  
JAGOTA A, 2001, MICROARRAY DATA ANAL
[14]   Quasi-consensus-based comparison of profile hidden Markov models for protein sequences [J].
Kahsay, RY ;
Wang, GL ;
Gao, G ;
Liao, L ;
Dunbrack, R .
BIOINFORMATICS, 2005, 21 (10) :2287-2293
[15]  
Koonin E.V., 2001, GENOME BIOL, V2
[16]   DETECTING SUBTLE SEQUENCE SIGNALS - A GIBBS SAMPLING STRATEGY FOR MULTIPLE ALIGNMENT [J].
LAWRENCE, CE ;
ALTSCHUL, SF ;
BOGUSKI, MS ;
LIU, JS ;
NEUWALD, AF ;
WOOTTON, JC .
SCIENCE, 1993, 262 (5131) :208-214
[17]  
MERKEEV IV, UNPUB PHOG DATABASE
[18]   IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON [J].
PEARSON, WR ;
LIPMAN, DJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) :2444-2448
[19]   Searching databases of conserved sequence regions by aligning protein multiple-alignments [J].
Pietrokovski, S .
NUCLEIC ACIDS RESEARCH, 1996, 24 (19) :3836-3845
[20]   Comparison of sequence profiles. Strategies for structural predictions using sequence information [J].
Rychlewski, L ;
Jaroszewski, L ;
Li, WZ ;
Godzik, A .
PROTEIN SCIENCE, 2000, 9 (02) :232-241