Globally, unrelated protein sequences appear random

被引:11
作者
Lavelle, Daniel T. [1 ]
Pearson, William R. [1 ]
机构
[1] Univ Virginia, Dept Biochem & Mol Genet, Charlottesville, VA 22908 USA
关键词
SECONDARY STRUCTURE PREDICTION; AMINO-ACID;
D O I
10.1093/bioinformatics/btp660
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: To test whether protein folding constraints and secondary structure sequence preferences significantly reduce the space of amino acid words in proteins, we compared the frequencies of four-and five-amino acid word clumps (independent words) in proteins to the frequencies predicted by four random sequence models. Results: While the human proteome has many overrepresented word clumps, these words come from large protein families with biased compositions (e. g. Zn-fingers). In contrast, in a non-redundant sample of Pfam-AB, only 1% of four-amino acid word clumps (4.7% of 5mer words) are 2-fold overrepresented compared with our simplest random model [MC(0)], and 0.1% (4mers) to 0.5% (5mers) are 2-fold overrepresented compared with a window-shuffled random model. Using a false discovery rate q-value analysis, the number of exceptional four-or five-letter words in real proteins is similar to the number found when comparing words from one random model to another. Consensus overrepresented words are not enriched in conserved regions of proteins, but four-letter words are enriched 1.18-to 1.56-fold in alpha-helical secondary structures (but not beta-strands). Five-residue consensus exceptional words are enriched for alpha-helix 1.43-to 1.61-fold. Protein word preferences in regular secondary structure do not appear to significantly restrict the use of sequence words in unrelated proteins, although the consensus exceptional words have a secondary structure bias for alpha-helix. Globally, words in protein sequences appear to be under very few constraints; for the most part, they appear to be random.
引用
收藏
页码:310 / 318
页数:9
相关论文
共 35 条
[11]   Assessment of CASP7 structure predictions for template free targets [J].
Jauch, Ralf ;
Yeo, Hock Chuan ;
Kolatkar, Prasanna R. ;
Clarke, Neil D. .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 69 :57-67
[12]   Protein secondary structure prediction based on position-specific scoring matrices [J].
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 292 (02) :195-202
[13]  
KARPLUS M, 1994, PROTEIN SCI, V3, P650
[14]   The complete folding pathway of a protein from nanoseconds to microseconds [J].
Mayor, U ;
Guydosh, NR ;
Johnson, CM ;
Grossmann, JG ;
Sato, S ;
Jas, GS ;
Freund, SMV ;
Alonso, DOV ;
Daggett, V ;
Fersht, AR .
NATURE, 2003, 421 (6925) :863-867
[15]   Evolutionary conservation of the folding nucleus [J].
Mirny, L ;
Shakhnovich, E .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 308 (02) :123-129
[16]   A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction [J].
Moult, J .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2005, 15 (03) :285-289
[17]   Simplified amino acid alphabets for protein fold recognition and implications for folding [J].
Murphy, LR ;
Wallqvist, A ;
Levy, RM .
PROTEIN ENGINEERING, 2000, 13 (03) :149-152
[18]  
Nuel G, 2006, STAT APPL GENET MOL, V5
[19]   CATH - a hierarchic classification of protein domain structures [J].
Orengo, CA ;
Michie, AD ;
Jones, S ;
Jones, DT ;
Swindells, MB ;
Thornton, JM .
STRUCTURE, 1997, 5 (08) :1093-1108
[20]   Proteomic signatures: Amino acid and oligopeptide compositions differentiate among phyla [J].
Pe'er, I ;
Felder, CE ;
Man, O ;
Silman, I ;
Sussman, JL ;
Beckmann, JS .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2004, 54 (01) :20-40