Hybrid alignment: high-performance with universal statistics

被引:17
作者
Yu, YK
Bundschuh, R
Hwa, T
机构
[1] Florida Atlantic Univ, Dept Phys, Boca Raton, FL 33431 USA
[2] Univ Calif San Diego, Dept Phys, La Jolla, CA 92093 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/18.6.864
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The score statistics of a recently introduced 'hybrid alignment' algorithm is studied in detail numerically. An extensive survey across the 2216 models of protein domains contained in the Pfam v5.4 database (Bateman et al., Nucleic Acids Res., 28, 263-266, 2000) verifies the theoretical predictions: For the position-specific scoring functions used in the Pfam models, the score statistics of hybrid alignment obey the Gumbel distribution, with the key Gumbel parameter lambda taking on the asymptotic value I universally for all models. Thus, the use of hybrid alignment eliminates the time-consuming computer simulations normally needed to assign p-values to alignment scores, freeing the users to experiment with different scoring parameters and functions. The performance of the hybrid algorithm in detecting sequence homology is also studied. For protein sequences from the SCOP database (Murzin et al., J. Mol. Biol., 247, 536-540, 1995) using uniform scoring functions, the performance is found to be comparable to the best of the existing methods. Preliminary results using the PfamA database suggest that the hybrid algorithm achieves similar performance as existing methods for position-specific scoring systems as well. Hybrid alignment is thereby established as a high performance alignment algorithm with well-characterized, universal statistics.
引用
收藏
页码:864 / 872
页数:9
相关论文
共 30 条
[1]  
Altschul SF, 1996, METHOD ENZYMOL, V266, P460
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[5]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[6]  
Brenner SE, 1996, METHOD ENZYMOL, V266, P635
[7]  
Bucher P, 1996, Proc Int Conf Intell Syst Mol Biol, V4, P44
[8]  
BUNDSCHUH R, 2000, P 4 ANN INT C COMP M, P86
[9]  
COLLINS JF, 1988, COMPUT APPL BIOSCI, V4, P67
[10]  
Dayhoff M.O., 1978, ATLAS PROTEIN SEQ ST, V5