3DCoffee: Combining protein sequences and structures within multiple sequence alignments

被引:237
作者
O'Sullivan, O
Suhre, K
Abergel, C
Higgins, DG
Notredame, C
机构
[1] CNRS, UPR 2589, F-13402 Marseille, France
[2] Univ Coll Dublin, Conway Inst, Dublin 4, Ireland
[3] Swiss Inst Bioinformat, CH-1066 Epalinges, Switzerland
关键词
multiple alignment; structural superposition; TCoffee; threading; sap;
D O I
10.1016/j.jmb.2004.04.058
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Most bioinformatics analyses require the assembly of a multiple sequence alignment. It has long been suspected that structural information can help to improve the quality of these alignments, yet the effect of combining sequences and structures has not been evaluated systematically. We developed 3DCoffee, a novel method for combining protein sequences and structures in order to generate high-quality multiple sequence alignments. 3DCoffee is based on TCoffee version 2.00, and uses a mixture of pairwise sequence alignments and pairwise structure comparison methods to generate multiple sequence alignments. We benchmarked 3DCoffee using a subset of HOMSTRAD, the collection of reference structural alignments. We found that combining TCoffee with the threading program Fugue makes it possible to improve the accuracy of our HOMSTRAD dataset by four percentage points when using one structure only per dataset. Using two structures yields an improvement of ten percentage points. The measures carried out on HOM39, a HOMSTRAD subset composed of distantly related sequences, show a linear correlation between multiple sequence alignment accuracy and the ratio of number of provided structure to total number of sequences. Our results suggest that in the case of distantly related sequences, a single structure may not be enough for computing an accurate multiple sequence alignment. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:385 / 395
页数:11
相关论文
共 43 条
[1]   Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases [J].
Al-Lazikani, B ;
Sheinerman, FB ;
Honig, B .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (26) :14796-14801
[2]  
Bourne Philip E, 2003, Methods Biochem Anal, V44, P501
[3]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[4]   A tour of structural genomics [J].
Brenner, SE .
NATURE REVIEWS GENETICS, 2001, 2 (10) :801-809
[5]   AN EMPIRICAL ENERGY FUNCTION FOR THREADING PROTEIN-SEQUENCE THROUGH THE FOLDING MOTIF [J].
BRYANT, SH ;
LAWRENCE, CE .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 1993, 16 (01) :92-112
[6]   A study of quality measures for protein threading models [J].
Cristobal, Susana ;
Zemla, Adam ;
Fischer, Daniel ;
Rychlewski, Leszek ;
Elofsson, Arne .
BMC BIOINFORMATICS, 2001, 2 (1)
[7]  
DURET L, 2000, BIOINFORMATICS SEQUE, P135
[8]   Structure comparison and structure patterns [J].
Eidhammer, I ;
Jonassen, I ;
Taylor, WR .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (05) :685-716
[9]   THE ALIGNMENT OF SETS OF SEQUENCES AND THE CONSTRUCTION OF PHYLETIC TREES - AN INTEGRATED METHOD [J].
HOGEWEG, P ;
HESPER, B .
JOURNAL OF MOLECULAR EVOLUTION, 1984, 20 (02) :175-186
[10]   PROTEIN-STRUCTURE COMPARISON BY ALIGNMENT OF DISTANCE MATRICES [J].
HOLM, L ;
SANDER, C .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 233 (01) :123-138