Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise

被引:3
作者
Dessimoz, Christophe [1 ]
Gil, Manuel [1 ]
机构
[1] ETH, Dept Comp Sci, CH-8092 Zurich, Switzerland
关键词
D O I
10.1186/1471-2148-8-179
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The estimation of a distance between two biological sequences is a fundamental process in molecular evolution. It is usually performed by maximum likelihood ( ML) on characters aligned either pairwise or jointly in a multiple sequence alignment ( MSA). Estimators for the covariance of pairs from an MSA are known, but we are not aware of any solution for cases of pairs aligned independently. In large-scale analyses, it may be too costly to compute MSAs every time distances must be compared, and therefore a covariance estimator for distances estimated from pairs aligned independently is desirable. Knowledge of covariances improves any process that compares or combines distances, such as in generalized least-squares phylogenetic tree building, orthology inference, or lateral gene transfer detection. Results: In this paper, we introduce an estimator for the covariance of distances from sequences aligned pairwise. Its performance is analyzed through extensive Monte Carlo simulations, and compared to the well-known variance estimator of ML distances. Our covariance estimator can be used together with the ML variance estimator to form covariance matrices. Conclusion: The estimator performs similarly to the ML variance estimator. In particular, it shows no sign of bias when sequence divergence is below 150 PAM units ( i. e. above similar to 29% expected sequence identity). Above that distance, the covariances tend to be underestimated, but then ML variances are also underestimated.
引用
收藏
页数:9
相关论文
共 19 条
[1]   EMPIRICAL AND STRUCTURAL MODELS FOR INSERTIONS AND DELETIONS IN THE DIVERGENT EVOLUTION OF PROTEINS [J].
BENNER, SA ;
COHEN, MA ;
GONNET, GH .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 229 (04) :1065-1082
[2]  
BULMER M, 1991, MOL BIOL EVOL, V8, P868
[3]  
CAVALLISFORZA LL, 1967, EVOLUTION, V21, P550, DOI 10.1111/j.1558-5646.1967.tb03411.x
[4]   Roundup: a multi-genome repository of orthologs and evolutionary distances [J].
DeLuca, Todd F. ;
Wu, I-Hsien ;
Pu, Jian ;
Monaghan, Thomas ;
Peshkin, Leonid ;
Singh, Saurav ;
Wall, Dennis P. .
BIOINFORMATICS, 2006, 22 (16) :2044-2046
[5]  
Dessimoz C, 2005, LECT NOTES COMPUT SC, V3678, P61
[6]  
DESSIMOZ C, 2008, LECT NOTES COMPUTER
[7]   Fast estimation of the difference between two PAM/JTT evolutionary distances in triplets of homologous sequences [J].
Dessimoz, Christophe ;
Gil, Manuel ;
Schneider, Adrian ;
Gonnet, Gaston H. .
BMC BIOINFORMATICS, 2006, 7 (1)
[8]  
Efron B., 1993, INTRO BOOTSTRAP MONO, DOI DOI 10.1201/9780429246593
[9]  
Felsenstein Joseph, 2004, Inferring_phylogenies, V2
[10]   BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data [J].
Gascuel, O .
MOLECULAR BIOLOGY AND EVOLUTION, 1997, 14 (07) :685-695