Reduction of protein sequence complexity by residue grouping

被引:121
作者
Li, TP
Fan, K
Wang, J
Wang, W
机构
[1] Nanjing Univ, Inst Biophys, Natl Lab Solid State Microstruct, Nanjing 210093, Peoples R China
[2] Nanjing Univ, Dept Phys, Nanjing 210093, Peoples R China
来源
PROTEIN ENGINEERING | 2003年 / 16卷 / 05期
基金
中国国家自然科学基金;
关键词
compositions of amino acids; protein fold recognition; reduced alphabet of amino acids; residue grouping; similarity matrix; AMINO-ACID ALPHABETS; SUBSTITUTION MATRICES; FOLDING PROBLEM;
D O I
10.1093/protein/gzg044
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
It is well known that there are some similarities among various naturally occurring amino acids. Thus, the complexity in protein systems could be reduced by sorting these amino acids with similarities into groups and then protein sequences can be simplified by reduced alphabets. This paper discusses how to group similar amino acids and whether there is a minimal amino acid alphabet by which proteins can be folded. Various reduced alphabets are obtained by reserving the maximal information for the simplified protein sequence compared with the parent sequence using global sequence alignment. With these reduced alphabets and simplified similarity matrices, we achieve recognition of the protein fold based on the similarity score of the sequence alignment. The coverage in dataset SCOP40 for various levels of reduction on the amino acid types is obtained, which is the number of homologous pairs detected by program BLAST to the number marked by SCOP40. For the reduced alphabets containing 10 types of amino acids, the ability to detect distantly related folds remains almost at the same level as that by the alphabet of 20 types of amino acids, which implies that 10 types of amino acids may be the degree of freedom for characterizing the complexity in proteins.
引用
收藏
页码:323 / 330
页数:8
相关论文
共 31 条
[1]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[2]   A PREDICTED 3-DIMENSIONAL STRUCTURE FOR THE CARCINOEMBRYONIC ANTIGEN (CEA) [J].
BATES, PA ;
LUO, JC ;
STERNBERG, MJE .
FEBS LETTERS, 1992, 301 (02) :207-214
[3]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[4]   Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices [J].
Cannata, N ;
Toppo, S ;
Romualdi, C ;
Valle, G .
BIOINFORMATICS, 2002, 18 (08) :1102-1108
[5]   Folding alphabets [J].
Chan, HS .
NATURE STRUCTURAL BIOLOGY, 1999, 6 (11) :994-996
[6]   COMPACT POLYMERS [J].
CHAN, HS ;
DILL, KA .
MACROMOLECULES, 1989, 22 (12) :4559-4573
[7]   Amino acid classes and the protein folding problem [J].
Cieplak, M ;
Holter, NS ;
Maritan, A ;
Banavar, JR .
JOURNAL OF CHEMICAL PHYSICS, 2001, 114 (03) :1420-1423
[8]   COOPERATIVELY FOLDED PROTEINS IN RANDOM SEQUENCE LIBRARIES [J].
DAVIDSON, AR ;
LUMB, KJ ;
SAUER, RT .
NATURE STRUCTURAL BIOLOGY, 1995, 2 (10) :856-864
[9]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[10]   A critical view on conservative mutations [J].
Jonson, PH ;
Petersen, SB .
PROTEIN ENGINEERING, 2001, 14 (06) :397-402