Protein database searches using compositionally adjusted substitution matrices

被引:796
作者
Altschul, SF [1 ]
Wootton, JC [1 ]
Gertz, EM [1 ]
Agarwala, R [1 ]
Morgulis, A [1 ]
Schäffer, AA [1 ]
Yu, YK [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
关键词
BLAST; BLOSUM; compositional adjustment; protein database searches; substitution matrices;
D O I
10.1111/j.1742-4658.2005.04945.x
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long-standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches, in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately, we describe several simple criteria under which invoking such adjustment is on average beneficial. In a typical database search, at least one of these criteria is satisfied by over half the related sequence pairs. Compositional substitution matrix adjustment is now available in NCBI's protein-protein version of BLAST.
引用
收藏
页码:5101 / 5109
页数:9
相关论文
共 42 条
  • [1] AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE
    ALTSCHUL, SF
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) : 555 - 565
  • [2] ALTSCHUL SF, 1986, B MATH BIOL, V48, P603, DOI 10.1016/S0092-8240(86)90010-8
  • [3] The estimation of statistical parameters for local alignment score distributions
    Altschul, SF
    Bundschuh, R
    Olsen, R
    Hwa, T
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (02) : 351 - 361
  • [4] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [5] A PROTEIN ALIGNMENT SCORING SYSTEM SENSITIVE AT ALL EVOLUTIONARY DISTANCES
    ALTSCHUL, SF
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1993, 36 (03) : 290 - 300
  • [6] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [7] [Anonymous], 1994, Ann. Prob
  • [8] [Anonymous], 1978, Atlas of protein sequence and structure
  • [9] Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships
    Brenner, SE
    Chothia, C
    Hubbard, TJP
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) : 6073 - 6078
  • [10] ASTRAL compendium enhancements
    Chandonia, JM
    Walker, NS
    Conte, LL
    Koehl, P
    Levitt, M
    Brenner, SE
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 260 - 263