A PROTEIN ALIGNMENT SCORING SYSTEM SENSITIVE AT ALL EVOLUTIONARY DISTANCES

被引：103

作者：

ALTSCHUL, SF

机构：

[1] National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Mississippi

来源：

JOURNAL OF MOLECULAR EVOLUTION | 1993年 / 36卷 / 03期

关键词：

HOMOLOGY; SEQUENCE COMPARISON; STATISTICAL SIGNIFICANCE; ALIGNMENT ALGORITHMS; PATTERN RECOGNITION;

D O I：

10.1007/BF00160485

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Protein sequence alignments generally are constructed with the aid of a ''substitution matrix'' that specifies a score for aligning each pair of amino acids. Assuming a simple random protein model, it can be shown that any such matrix, when used for evaluating variable-length local alignments, is implicitly a ''log-odds'' matrix, with a specific probability distribution for amino acid pairs to which it is uniquely tailored. Given a model of protein evolution from which such distributions may be derived, a substitution matrix adapted to detecting relationships at any chosen evolutionary distance can be constructed. Because in a database search it generally is not known a priori what evolutionary distances will characterize the similarities found, it is necessary to employ an appropriate range of matrices in order not to overlook potential homologies. This paper formalizes this concept by defining a scoring system that is sensitive at all detectable evolutionary distances. The statistical behavior of this scoring system is analyzed, and it is shown that for a typical protein database search, estimating the originally unknown evolutionary distance appropriate to each alignment costs slightly over two bits of information, or somewhat less than a factor of five in statistical significance. A much greater cost may be incurred, however, if only a single substitution matrix, corresponding to the wrong evolutionary distance, is employed.

引用

页码：290 / 300

页数：11

共 54 条

[1] AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE
ALTSCHUL, SF
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) : 555 - 565
[2] A NONLINEAR MEASURE OF SUBALIGNMENT SIMILARITY AND ITS SIGNIFICANCE LEVELS
ALTSCHUL, SF
ERICKSON, BW
[J]. BULLETIN OF MATHEMATICAL BIOLOGY, 1986, 48 (5-6) : 617 - 632
[3] SIGNIFICANCE LEVELS FOR BIOLOGICAL SEQUENCE COMPARISON USING NON-LINEAR SIMILARITY FUNCTIONS
ALTSCHUL, SF
ERICKSON, BW
[J]. BULLETIN OF MATHEMATICAL BIOLOGY, 1988, 50 (01) : 77 - 92
[4] BASIC LOCAL ALIGNMENT SEARCH TOOL
ALTSCHUL, SF
GISH, W
MILLER, W
MYERS, EW
LIPMAN, DJ
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
[5] A SENSITIVE PROCEDURE TO COMPARE AMINO-ACID-SEQUENCES
ARGOS, P
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (02) : 385 - 396
[6] THE ERDOS-RENYI STRONG LAW FOR PATTERN-MATCHING WITH A GIVEN PROPORTION OF MISMATCHES
ARRATIA, R
WATERMAN, MS
[J]. ANNALS OF PROBABILITY, 1989, 17 (03) : 1152 - 1169
[7] STOCHASTIC SCRABBLE - LARGE DEVIATIONS FOR SEQUENCES WITH SCORES
ARRATIA, R
MORRIS, P
WATERMAN, MS
[J]. JOURNAL OF APPLIED PROBABILITY, 1988, 25 (01) : 106 - 119
[8] AN EXTREME VALUE THEORY FOR SEQUENCE MATCHING
ARRATIA, R
GORDON, L
WATERMAN, M
[J]. ANNALS OF STATISTICS, 1986, 14 (03) : 971 - 993
[9] BARKER WC, 1990, METHOD ENZYMOL, V183, P31
[10] CHOW ET, 1991, 1991 P INT C SUP, P216

← 1 2 3 4 5 6 →