Globally, unrelated protein sequences appear random

被引:11
作者
Lavelle, Daniel T. [1 ]
Pearson, William R. [1 ]
机构
[1] Univ Virginia, Dept Biochem & Mol Genet, Charlottesville, VA 22908 USA
关键词
SECONDARY STRUCTURE PREDICTION; AMINO-ACID;
D O I
10.1093/bioinformatics/btp660
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: To test whether protein folding constraints and secondary structure sequence preferences significantly reduce the space of amino acid words in proteins, we compared the frequencies of four-and five-amino acid word clumps (independent words) in proteins to the frequencies predicted by four random sequence models. Results: While the human proteome has many overrepresented word clumps, these words come from large protein families with biased compositions (e. g. Zn-fingers). In contrast, in a non-redundant sample of Pfam-AB, only 1% of four-amino acid word clumps (4.7% of 5mer words) are 2-fold overrepresented compared with our simplest random model [MC(0)], and 0.1% (4mers) to 0.5% (5mers) are 2-fold overrepresented compared with a window-shuffled random model. Using a false discovery rate q-value analysis, the number of exceptional four-or five-letter words in real proteins is similar to the number found when comparing words from one random model to another. Consensus overrepresented words are not enriched in conserved regions of proteins, but four-letter words are enriched 1.18-to 1.56-fold in alpha-helical secondary structures (but not beta-strands). Five-residue consensus exceptional words are enriched for alpha-helix 1.43-to 1.61-fold. Protein word preferences in regular secondary structure do not appear to significantly restrict the use of sequence words in unrelated proteins, although the consensus exceptional words have a secondary structure bias for alpha-helix. Globally, words in protein sequences appear to be under very few constraints; for the most part, they appear to be random.
引用
收藏
页码:310 / 318
页数:9
相关论文
共 35 条
[1]   Protein secondary structure prediction for a single-sequence using hidden semi-Markov models [J].
Aydin, Zafer ;
Altunbasak, Yucel ;
Borodovsky, Mark .
BMC BIOINFORMATICS, 2006, 7 (1)
[2]   Is protein folding hierarchic? I. Local structure and peptide folding [J].
Baldwin, RL ;
Rose, GD .
TRENDS IN BIOCHEMICAL SCIENCES, 1999, 24 (01) :26-33
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[5]   PREDICTION OF PROTEIN CONFORMATION [J].
CHOU, PY ;
FASMAN, GD .
BIOCHEMISTRY, 1974, 13 (02) :222-245
[6]   Protein secondary structure: entropy, correlations and prediction [J].
Crooks, GE ;
Brenner, SE .
BIOINFORMATICS, 2004, 20 (10) :1603-1611
[7]   Structure prediction for CABP7 targets using extensive all-atom refinement with Rosetta@home [J].
Das, Rhiju ;
Bin Qian ;
Raman, Srivatsan ;
Vernon, Robert ;
Thompson, James ;
Bradley, Philip ;
Khare, Sagar ;
Tyka, Michael D. ;
Bhat, Divya ;
Chivian, Dylan ;
Kim, David E. ;
Sheffler, William H. ;
Malmstrom, Lars ;
Wollacott, Andrew M. ;
Wang, Chu ;
Andre, Ingemar ;
Baker, David .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 69 :118-128
[8]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[9]   OPTIMIZATION OF RATES OF PROTEIN-FOLDING - THE NUCLEATION-CONDENSATION MECHANISM AND ITS IMPLICATIONS [J].
FERSHT, AR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (24) :10869-10873
[10]   FOLDING OF CHYMOTRYPSIN INHIBITOR-2 .1. EVIDENCE FOR A 2-STATE TRANSITION [J].
JACKSON, SE ;
FERSHT, AR .
BIOCHEMISTRY, 1991, 30 (43) :10428-10435