Identification of related proteins with weak sequence identity using secondary structure information

被引:46
作者
Geourjon, C [1 ]
Combet, C [1 ]
Blanchet, C [1 ]
Deléage, G [1 ]
机构
[1] Inst Biol & Chim Prot, CNRS, UMR 5086, Pole Bioinformat Lyonnais, F-69367 Lyon 07, France
关键词
protein; molecular modeling; sequence; databank; alignment; structure prediction; secondary structure; Web server;
D O I
10.1110/ps.30001
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.
引用
收藏
页码:788 / 797
页数:12
相关论文
共 29 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   NPS@:: Network Protein Sequence Analysis [J].
Combet, C ;
Blanchet, C ;
Geourjon, C ;
Deléage, G .
TRENDS IN BIOCHEMICAL SCIENCES, 2000, 25 (03) :147-150
[3]  
Geourjon C, 1995, COMPUT APPL BIOSCI, V11, P681
[4]   PROFILE ANALYSIS - DETECTION OF DISTANTLY RELATED PROTEINS [J].
GRIBSKOV, M ;
MCLACHLAN, AD ;
EISENBERG, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1987, 84 (13) :4355-4358
[5]  
Hargbo J, 1999, PROTEINS, V36, P68, DOI 10.1002/(SICI)1097-0134(19990701)36:1<68::AID-PROT6>3.3.CO
[6]  
2-T
[7]  
HOBOHM U, 1994, PROTEIN SCI, V3, P522
[8]  
HOLM L, 1994, NUCLEIC ACIDS RES, V22, P3600
[9]  
Jones DT, 1999, PROTEINS, P104
[10]   DICTIONARY OF PROTEIN SECONDARY STRUCTURE - PATTERN-RECOGNITION OF HYDROGEN-BONDED AND GEOMETRICAL FEATURES [J].
KABSCH, W ;
SANDER, C .
BIOPOLYMERS, 1983, 22 (12) :2577-2637