Statistical alignment: Computational properties, homology testing and goodness-of-fit

被引：58

作者：

Hein, J

Wiuf, C

Knudsen, B

Moller, MB

Wibling, G

机构：

[1] Aarhus Univ, Inst Biol Sci, Dept Ecol & Genet, DK-8000 Aarhus C, Denmark

[2] Univ Oxford, Dept Stat, Oxford OX1 3TG, England

[3] Aarhus Univ, Inst Comp Sci, DK-8000 Aarhus C, Denmark

来源：

JOURNAL OF MOLECULAR BIOLOGY | 2000年 / 302卷 / 01期

基金：

英国生物技术与生命科学研究理事会;

关键词：

statistical alignment; homology testing; goodness-of-fit;

D O I：

10.1006/jmbi.2000.4061

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

The model of insertions and deletions in biological sequences, first formulated by Theme, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model. Firstly, we show how to accelerate the statistical alignment algorithms several orders of magnitude. The main innovations are to confine likelihood calculations to a band close to the similarity based alignment, to get good initial guesses of the evolutionary parameters and to apply an efficient numerical optimisation algorithm for finding the maximum likelihood estimate. In addition, the recursions originally presented by Theme, Kishino and Felsenstein can be simplified. Two proteins, about 1500 amino acids long, can be analysed with this method in less than five seconds on a fast desktop computer, which makes this method practical for actual data analysis. Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins. Finally, we describe a goodness-of-fit test, that allows testing the proposed insertion-deletion (indel) process inherent to this model and find that real sequences (here globins) probably experience indels longer than one, contrary to what is assumed by the model. (C) 2000 Academic Press.

引用

页码：265 / 279

页数：15

共 20 条

[1] THE POSTERIOR PROBABILITY-DISTRIBUTION OF ALIGNMENTS AND ITS APPLICATION TO PARAMETER-ESTIMATION OF EVOLUTIONARY TREES AND TO OPTIMIZATION OF MULTIPLE ALIGNMENTS
ALLISON, L
WALLACE, CS
[J]. JOURNAL OF MOLECULAR EVOLUTION, 1994, 39 (04) : 418 - 430
[2] A PROTEIN ALIGNMENT SCORING SYSTEM SENSITIVE AT ALL EVOLUTIONARY DISTANCES
ALTSCHUL, SF
[J]. JOURNAL OF MOLECULAR EVOLUTION, 1993, 36 (03) : 290 - 300
[3] [Anonymous], 1978, Atlas of protein sequence and structure
[4] MAXIMUM-LIKELIHOOD ALIGNMENT OF DNA-SEQUENCES
BISHOP, MJ
THOMPSON, EA
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1986, 190 (02) : 159 - 165
[5] BUCHER P, 1996, P 4 INT C INT SYST M, P44
[6] PROTEINS - 1000 FAMILIES FOR THE MOLECULAR BIOLOGIST
CHOTHIA, C
[J]. NATURE, 1992, 357 (6379) : 543 - 544
[7] COX DR, 1962, J ROY STAT SOC B, V24, P406
[8] Doolittle R.F., 1986, Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences
[9] Edwards A. W. F., 1972, LIKELIHOOD
[10] Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses
Goldman, N
Thorne, JL
Jones, DT
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1996, 263 (02) : 196 - 208

← 1 2 →