Quality measures for protein alignment benchmarks

被引:86
作者
Edgar, Robert C.
机构
关键词
MULTIPLE SEQUENCE ALIGNMENT; SECONDARY STRUCTURE; CRYSTAL-STRUCTURE; UBIQUITIN LIGASE; FOLD SPACE; DATABASE; ASSIGNMENT; DOMAIN; ACCURACY; SEARCH;
D O I
10.1093/nar/gkp1196
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple protein sequence alignment methods are central to many applications in molecular biology. These methods are typically assessed on benchmark datasets including BALIBASE, OXBENCH, PREFAB and SABMARK, which are important to biologists in making informed choices between programs. In this article, annotations of domain homology and secondary structure are used to define new measures of alignment quality and are used to make the first systematic, independent evaluation of these benchmarks. These measures indicate sensitivity and specificity while avoiding the ambiguous residue correspondences and arbitrary distance cutoffs inherent to structural superpositions. Alignments by selected methods that indicate high-confidence columns (ALIGN-M, DIALIGN-T, FSA and MUSCLE) are also assessed. Fold space coverage and effective benchmark database sizes are estimated by reference to domain annotations, and significant redundancy is found in all benchmarks except SABMARK. Questionable alignments are found in all benchmarks, especially in BALIBASE where 87% of sequences have unknown structure, 20% of columns contain different folds according to SUPERFAMILY and 30% of 'core block' columns have conflicting secondary structure according to DSSP. A careful analysis of current protein multiple alignment benchmarks calls into question their ability to determine reliable algorithm rankings.
引用
收藏
页码:2145 / 2153
页数:9
相关论文
共 46 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   The iRMSD: a local measure of sequence alignment accuracy using structural information [J].
Armougom, Fabrice ;
Moretti, Sebastien ;
Keduas, Vladimir ;
Notredame, Cedric .
BIOINFORMATICS, 2006, 22 (14) :E35-E39
[4]   The SOCS box domain of SOCS3: Structure and interaction with the elonginBC-cullin5 ubiquitin ligase [J].
Babon, Jeffrey J. ;
Sabo, Jennifer K. ;
Soetopo, Alfreda ;
Yao, Shenggen ;
Bailey, Michael F. ;
Zhang, Jian-Guo ;
Nicola, Nicos A. ;
Norton, Raymond S. .
JOURNAL OF MOLECULAR BIOLOGY, 2008, 381 (04) :928-940
[5]   BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations [J].
Bahr, A ;
Thompson, JD ;
Thierry, JC ;
Poch, O .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :323-326
[6]   CRYSTAL-STRUCTURE OF HUMAN PROTEIN-TYROSINE-PHOSPHATASE 1B [J].
BARFORD, D ;
FLINT, AJ ;
TONKS, NK .
SCIENCE, 1994, 263 (5152) :1397-1404
[7]  
Blackshields Gordon, 2006, In Silico Biol, V6, P321
[8]   OPTIMAL PROTEIN-STRUCTURE ALIGNMENTS BY MULTIPLE LINKAGE CLUSTERING - APPLICATION TO DISTANTLY RELATED PROTEINS [J].
BOUTONNET, NS ;
ROOMAN, MJ ;
OCHAGAVIA, ME ;
RICHELLE, J ;
WODAK, SJ .
PROTEIN ENGINEERING, 1995, 8 (07) :647-662
[9]   Fast Statistical Alignment [J].
Bradley, Robert K. ;
Roberts, Adam ;
Smoot, Michael ;
Juvekar, Sudeep ;
Do, Jaeyoung ;
Dewey, Colin ;
Holmes, Ian ;
Pachter, Lior .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (05)
[10]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256