BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark

被引:295
作者
Thompson, JD
Koehl, P
Ripp, R
Poch, O
机构
[1] Univ Strasbourg 1, INSERM, CNRS, Inst Genet & Biol Mol & Cellulaire,Dept Biol & Ge, F-67404 Illkirch Graffenstaden, France
[2] Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA
[3] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
关键词
alignment accuracy; alignment reliability; reference alignment; program evaluation; program comparison; structure superposition;
D O I
10.1002/prot.20527
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site (http://www-bio3d-igbmc.ustrasbg.frfbalibase) has been completely redesigned to provide a more user-friendly, interactive interface for the visualization of the BAliBASE reference alignments and the associated annotations.
引用
收藏
页码:127 / 136
页数:10
相关论文
共 57 条
  • [1] Protein structure prediction
    Al-Lazikani, B
    Jung, J
    Xiang, ZX
    Honig, B
    [J]. CURRENT OPINION IN CHEMICAL BIOLOGY, 2001, 5 (01) : 51 - 56
  • [2] Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics
    Althaus, E
    Caprara, A
    Lenhof, HP
    Reinert, K
    [J]. BIOINFORMATICS, 2002, 18 : S4 - S16
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] SCOP database in 2004: refinements integrate structure and sequence family data
    Andreeva, A
    Howorth, D
    Brenner, SE
    Hubbard, TJP
    Chothia, C
    Murzin, AG
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D226 - D229
  • [5] Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
  • [6] BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations
    Bahr, A
    Thompson, JD
    Thierry, JC
    Poch, O
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 323 - 326
  • [7] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
  • [8] BIANCHETTI L, 2005, IN PRESS JBCB BIOINF
  • [9] The distribution and query systems of the RCSB protein data bank
    Bourne, PE
    Addess, KJ
    Bluhm, WF
    Chen, L
    Deshpande, N
    Feng, ZK
    Fleri, W
    Green, R
    Merino-Ott, JC
    Townsend-Merino, W
    Weissig, H
    Westbrook, J
    Berman, HM
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D223 - D225
  • [10] ProbCons: Probabilistic consistency-based multiple sequence alignment
    Do, CB
    Mahabhashyam, MSP
    Brudno, M
    Batzoglou, S
    [J]. GENOME RESEARCH, 2005, 15 (02) : 330 - 340