Scoring profile-to-profiles sequence alignments

被引:89
作者
Wang, GL [1 ]
Dunbrack, RL [1 ]
机构
[1] Fox Chase Canc Ctr, Inst Canc Res, Philadelphia, PA 19111 USA
关键词
sequence profiles; profile-profile alignment; PSI-BLAST;
D O I
10.1110/ps.03601504
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Sequence alignment profiles have been shown to be very powerful in creating accurate sequence alignments. Profiles are often used to search a sequence database with a local alignment algorithm. More accurate and longer alignments have been obtained with profile-to-profile comparison. There are several steps that must be performed in creating profile-profile alignments, and each involves choices in parameters and algorithms. These steps include (1) what sequences to include in a multiple alignment used to build each profile, (2) how to weight similar sequences in the multiple alignment and how to determine amino acid frequencies front the weighted alignment, (3) how to score a column from one profile aligned to a column of the other profile, (4) how to score gaps in the profile-profile alignment, and (5) how to include structural information. Large-scale benchmarks consisting of pairs of homologous proteins with structurally determined sequence alignments are necessary for evaluating the efficacy of each scoring scheme. With such a benchmark, we have investigated the properties of profile-profile alignments and found that (1) with optimized gap penalties, most column-column scoring functions behave similarly to one another in alignment accuracy; (2) some functions, however, have much hi-her search sensitivity and specificity: (3) position-specific weighting schemes in determining amino acid counts in Columns of multiple sequence alignments are better than sequence-specific schemes; (4) removing positions in the profile with gaps in the query sequence results in better alignments; and (5) adding predicted and known secondary structure information improves alignments.
引用
收藏
页码:1612 / 1626
页数:15
相关论文
共 34 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
Bourne Philip E, 2003, Methods Biochem Anal, V44, P501
[3]   Cyclic coordinate descent: A robotics algorithm for protein loop closure [J].
Canutescu, AA ;
Dunbrack, RL .
PROTEIN SCIENCE, 2003, 12 (05) :963-972
[4]   A graph-theory algorithm for rapid protein side-chain prediction [J].
Canutescu, AA ;
Shelenkov, AA ;
Dunbrack, RL .
PROTEIN SCIENCE, 2003, 12 (09) :2001-2014
[5]  
Fischer D, 1996, Pac Symp Biocomput, P300
[6]   Knowledge-based protein secondary structure assignment [J].
Frishman, D ;
Argos, P .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1995, 23 (04) :566-579
[7]   ORFeus: detection of distant homology using sequence profiles and predicted secondary structure [J].
Ginalski, K ;
Pas, J ;
Wyrwicz, LS ;
von Grotthuss, M ;
Bujnicki, JM ;
Rychlewski, L .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3804-3807
[8]   PROFILE ANALYSIS - DETECTION OF DISTANTLY RELATED PROTEINS [J].
GRIBSKOV, M ;
MCLACHLAN, AD ;
EISENBERG, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1987, 84 (13) :4355-4358
[9]   PERFORMANCE EVALUATION OF AMINO-ACID SUBSTITUTION MATRICES [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 1993, 17 (01) :49-61
[10]   POSITION-BASED SEQUENCE WEIGHTS [J].
HENIKOFF, S ;
HENIKOFF, JG .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 243 (04) :574-578