Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments

被引:13
作者
Fox, Gearoid
Sievers, Fabian
Higgins, Desmond G. [1 ]
机构
[1] Univ Coll Dublin, Conway Inst Biomol & Biomed Res, Dublin 4, Ireland
基金
爱尔兰科学基金会;
关键词
CHAINED GUIDE TREES; CONTACT PREDICTION; SOFTWARE; PERFORMANCE; ALGORITHM;
D O I
10.1093/bioinformatics/btv592
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for testing these types of alignments, as test cases either contain a very small number of sequences or are based purely on simulation rather than empirical data. Results: We take advantage of recent developments in protein structure prediction methods to create a benchmark (ContTest) for protein MSAs containing many thousands of sequences in each test case and which is based on empirical biological data. We rank popular MSA methods using this benchmark and verify a recent result showing that chained guide trees increase the accuracy of progressive alignment packages on datasets with thousands of proteins.
引用
收藏
页码:814 / 820
页数:7
相关论文
共 31 条
[1]  
[Anonymous], PFAM SET PROTEIN ID
[2]  
[Anonymous], BMC BIOINFORMATICS
[3]  
[Anonymous], PROTEIN SCI
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   Sequence embedding for fast construction of guide trees for multiple sequence alignment [J].
Blackshields, Gordon ;
Sievers, Fabian ;
Shi, Weifeng ;
Wilm, Andreas ;
Higgins, Desmond G. .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2010, 5
[6]   Reply to Tan et al.: Differences between real and simulated proteins in multiple sequence alignments [J].
Boyce, Kieran ;
Sievers, Fabian ;
Higgins, Desmond G. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (02) :E101-E101
[7]   Simple chained guide trees give high-quality protein multiple sequence alignments [J].
Boyce, Kieran ;
Sievers, Fabian ;
Higgins, Desmond G. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (29) :10556-10561
[8]   Phylogenetic assessment of alignments reveals neglected tree signal in gaps [J].
Dessimoz, Christophe ;
Gil, Manuel .
GENOME BIOLOGY, 2010, 11 (04)
[9]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[10]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797