Information content of protein sequences

被引:104
作者
Weiss, O [1 ]
Jiménez-Montaño, MA
Herzel, H
机构
[1] Humboldt Univ, Inst Theoret Biol, Invalidenstr 43, D-10115 Berlin, Germany
[2] Univ Amer Puebla Santa Catarina Martir, Puebla 7280, Mexico
关键词
D O I
10.1006/jtbi.2000.2138
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our results show that proteins are fairly close to random sequences. The entropy reduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to finite sample effects. Compression algorithms also indicate that the redundancy is in the order of 1%. These results confirm the idea that protein sequences can be regarded as slightly edited random strings. We discuss secondary structure and low-complexity regions as causes of the redundancy observed. The findings are related to numerical and biochemical experiments with random polypeptides. (C) 2000 Academic Press.
引用
收藏
页码:379 / 386
页数:8
相关论文
共 41 条
[1]  
BASARIN GP, 1959, TEOR VEROYA PRIMEN, V4, P361
[2]   UNDERLYING ORDER IN PROTEIN-SEQUENCE ORGANIZATION [J].
BERMAN, AL ;
KOLKER, E ;
TRIFONOV, EN .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (09) :4044-4047
[3]  
Burrows M., 1994, 124 DIG EQ CORP
[4]   Characterization and comparison of protein structures. Part I - Characterization [J].
Chechetkin, VR ;
Lobzin, VV .
JOURNAL OF THEORETICAL BIOLOGY, 1999, 198 (02) :197-218
[5]  
EBELING W, 1980, MATH BIOSCI, V52, P53, DOI 10.1016/0025-5564(80)90004-8
[6]   ANALYSIS OF ACCURACY AND IMPLICATIONS OF SIMPLE METHODS FOR PREDICTING SECONDARY STRUCTURE OF GLOBULAR PROTEINS [J].
GARNIER, J ;
OSGUTHORPE, DJ ;
ROBSON, B .
JOURNAL OF MOLECULAR BIOLOGY, 1978, 120 (01) :97-120
[7]  
GATLIN LL, 1972, INFORMATION THEORY L
[8]   VOLUME CHANGES IN PROTEIN EVOLUTION [J].
GERSTEIN, M ;
SONNHAMMER, ELL ;
CHOTHIA, C .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 236 (04) :1067-1078
[9]   Simulation of biomimetic recognition between polymers and surfaces [J].
Golumbfskie, AJ ;
Pande, VS ;
Chakraborty, AK .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (21) :11707-11712
[10]   Species independence of mutual information in coding and noncoding DNA [J].
Grosse, I ;
Herzel, H ;
Buldyrev, SV ;
Stanley, HE .
PHYSICAL REVIEW E, 2000, 61 (05) :5624-5629