De novo identification of highly diverged protein repeats by probabilistic consistency

被引:114
作者
Biegert, A. [1 ,2 ]
Soeding, J. [1 ,2 ]
机构
[1] Max Planck Inst Dev Biol, Dept Protein Evolut, D-72076 Tubingen, Germany
[2] Univ Munich, Gene Ctr Munich, D-81377 Munich, Germany
关键词
D O I
10.1093/bioinformatics/btn039
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: An estimated 25% of all eukaryotic proteins contain repeats, which underlines the importance of duplication for evolving new protein functions. Internal repeats often correspond to structural or functional units in proteins. Methods capable of identifying diverged repeated segments or domains at the sequence level can therefore assist in predicting domain structures, inferring hypotheses about function and mechanism, and investigating the evolution of proteins from smaller fragments. Results: We present HHrepID, a method for the de novo identification of repeats in protein sequences. It is able to detect the sequence signature of structural repeats in many proteins that have not yet been known to possess internal sequence symmetry, such as outer membrane beta-barrels. HHrepID uses HMMHMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs. In contrast to a previous method, the new method (1) generates a multiple alignment of repeats; (2) utilizes the transitive nature of homology through a novel merging procedure with fully probabilistic treatment of alignments; (3) improves alignment quality through an algorithm that maximizes the expected accuracy; (4) is able to identify different kinds of repeats within complex architectures by a probabilistic domain boundary detection method and (5) improves sensitivity through a new approach to assess statistical significance.
引用
收藏
页码:807 / 814
页数:8
相关论文
共 37 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Homology-based method for identification of protein repeats using statistical significance estimates [J].
Andrade, MA ;
Ponting, CP ;
Gibson, TJ ;
Bork, P .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 298 (03) :521-537
[3]  
Berman H.M., 2000, Protein Data Bank, P235
[4]   Expansion of protein domain repeats [J].
Bjorklund, Asa K. ;
Ekman, Diana ;
Elofsson, Arne .
PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (08) :959-970
[5]   Detecting periodic patterns in biological sequences [J].
Coward, E ;
Drablos, F .
BIOINFORMATICS, 1998, 14 (06) :498-507
[6]   ProbCons: Probabilistic consistency-based multiple sequence alignment [J].
Do, CB ;
Mahabhashyam, MSP ;
Brudno, M ;
Batzoglou, S .
GENOME RESEARCH, 2005, 15 (02) :330-340
[7]  
Durbin R., 1998, BIOL SEQUENCE ANAL
[8]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[9]   REPPER-repeats and their periodicities in fibrous proteins [J].
Gruber, M ;
Söding, J ;
Lupas, AN .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W239-W243
[10]  
Heger A, 2000, PROTEINS, V41, P224, DOI 10.1002/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO