Homology detection via family pairwise search

被引:31
作者
Grundy, WN [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
homology detection; proteins; pairwise sequence comparison; motif analysis; statistical modeling;
D O I
10.1089/cmb.1998.5.479
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The function of an unknown biological sequence can often be accurately inferred by identifying sequences homologous to the original sequence. Given a query set of known homologs, there exist at least three general classes of techniques for finding additional homologs: pairwise sequence comparisons, motif analysis, and hidden Markov modeling, Pairwise sequence comparisons are typically employed when only a single query sequence is known, Hidden Markov models (HMMs), on the other hand, are usually trained with sets of more than 100 sequences. Moth-based methods fall in between these two extremes. The current work introduces a straightforward generalization of pairwise sequence comparison algorithms to the case when multiple query sequences are available. This algorithm, called Family Pairwise Search (FPS), combines pairwise sequence comparison scores from each query sequence, A BLAST implementation of FPS is compared to representative examples of hidden Markov modeling (HMMER) and moth modeling (MEME). The three techniques are compared across a wide range of protein families, using query sets of varying sizes. BLAST FPS significantly outperforms moth-based and HMM methods. Furthermore, FPS is much more efficient than the training algorithms for statistical models.
引用
收藏
页码:479 / 491
页数:13
相关论文
共 39 条
[21]  
Grundy WN, 1996, COMPUT APPL BIOSCI, V12, P303
[22]  
Grundy WN, 1997, COMPUT APPL BIOSCI, V13, P397
[23]   POSITION-BASED SEQUENCE WEIGHTS [J].
HENIKOFF, S ;
HENIKOFF, JG .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 243 (04) :574-578
[24]  
KARCHIN R, 1994, IN PRESS BIOINFORMAT
[25]   COMPUTER-ANALYSIS OF BACTERIAL HALOACID DEHALOGENASES DEFINES A LARGE SUPERFAMILY OF HYDROLASES WITH DIVERSE SPECIFICITY - APPLICATION OF AN ITERATIVE APPROACH TO DATABASE SEARCH [J].
KOONIN, EV ;
TATUSOV, RL .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 244 (01) :125-132
[26]   HIDDEN MARKOV-MODELS IN COMPUTATIONAL BIOLOGY - APPLICATIONS TO PROTEIN MODELING [J].
KROGH, A ;
BROWN, M ;
MIAN, IS ;
SJOLANDER, K ;
HAUSSLER, D .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 235 (05) :1501-1531
[27]   DETECTING SUBTLE SEQUENCE SIGNALS - A GIBBS SAMPLING STRATEGY FOR MULTIPLE ALIGNMENT [J].
LAWRENCE, CE ;
ALTSCHUL, SF ;
BOGUSKI, MS ;
LIU, JS ;
NEUWALD, AF ;
WOOTTON, JC .
SCIENCE, 1993, 262 (5131) :208-214
[28]   DETECTING PATTERNS IN PROTEIN SEQUENCES [J].
NEUWALD, AF ;
GREEN, P .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 239 (05) :698-712
[29]   Extracting protein alignment models from the sequence database [J].
Neuwald, AF ;
Liu, JS ;
Lipman, DJ ;
Lawrence, CE .
NUCLEIC ACIDS RESEARCH, 1997, 25 (09) :1665-1677
[30]  
NEVILLMANNING CG, 1997, P 5 INT C INT SYST M