Evolutionary profiles derived from the QR factorization of multiple structural alignments gives an economy of information

被引:42
作者
O'Donoghue, P [1 ]
Luthey-Schulten, Z [1 ]
机构
[1] Univ Illinois, Dept Chem, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
protein structure profiles; evolution; non-redundant set; aminoacyl-tRNA synthetase; OB-fold;
D O I
10.1016/j.jmb.2004.11.053
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present a new algorithm, based on the multidimensional QR factorization, to remove redundancy from a multiple structural alignment by choosing representative protein structures that best preserve the phylogenetic tree topology of the homologous group. The classical QR factorization with pivoting, developed as a fast numerical solution to eigenvalue and linear least-squares problems of the form Ax=b, was designed to re-order the columns of A by increasing linear dependence. Removing the most linear dependent columns from A leads to the formation of a minimal basis set which well spans the phase space of the problem at hand. By recasting the problem of redundancy in multiple structural alignments into this framework, in which the matrix A now describes the multiple alignment, we adapted the QR factorization to produce a minimal basis set of protein structures which best spans the evolutionary (phase) space. The non-redundant and representative profiles obtained from this procedure, termed evolutionary profiles, are shown in initial results to outperform well-tested profiles in homology detection searches over a large sequence database. A measure of structural similarity between homologous proteins, Q(H), is presented. By properly accounting for the effect and presence of gaps, a phylogenetic tree computed using this metric is shown to be congruent with the maximum-likelihood sequencebased phylogeny. The results indicate that evolutionary information is indeed recoverable from the comparative analysis of protein structure alone. Applications of the QR ordering and this structural similarity metric to analyze the evolution of structure among key, universally distributed proteins involved in translation, and to the selection of representatives from an ensemble of NMR structures are also discussed. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:875 / 894
页数:20
相关论文
共 69 条
[1]   Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases [J].
Al-Lazikani, B ;
Sheinerman, FB ;
Honig, B .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (26) :14796-14801
[2]   Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures [J].
Alexandrov, V ;
Gerstein, M .
BMC BIOINFORMATICS, 2004, 5 (1)
[3]   PHYLOGENETIC IDENTIFICATION AND IN-SITU DETECTION OF INDIVIDUAL MICROBIAL-CELLS WITHOUT CULTIVATION [J].
AMANN, RI ;
LUDWIG, W ;
SCHLEIFER, KH .
MICROBIOLOGICAL REVIEWS, 1995, 59 (01) :143-169
[4]  
ARWIN C, 1887, LIFE LETT C DARWIN I
[5]  
ASTBURY WT, 1952, HARVEY LECT 1950 51
[6]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[7]   Finding important sites in protein sequences [J].
Bickel, PJ ;
Kechris, KJ ;
Spector, PC ;
Wedemayer, GJ ;
Glazer, AN .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (23) :14764-14771
[8]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[9]   The ASTRAL Compendium in 2004 [J].
Chandonia, JM ;
Hon, G ;
Walker, NS ;
Lo Conte, L ;
Koehl, P ;
Levitt, M ;
Brenner, SE .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D189-D192
[10]   THE RELATION BETWEEN THE DIVERGENCE OF SEQUENCE AND STRUCTURE IN PROTEINS [J].
CHOTHIA, C ;
LESK, AM .
EMBO JOURNAL, 1986, 5 (04) :823-826