Nature of the protein universe

被引:225
作者
Levitt, Michael [1 ]
机构
[1] Stanford Univ, Dept Biol Struct, Stanford, CA 94305 USA
基金
美国国家卫生研究院;
关键词
domain architecture; protein sequence; protein structure; structural genomics; STRUCTURE PREDICTION; ALPHA-LACTALBUMIN; FAMILIES; SEQUENCES; DOMAINS; BIOLOGY; TASSER; PFAM;
D O I
10.1073/pnas.0905029106
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The protein universe is the set of all proteins of all organisms. Here, all currently known sequences are analyzed in terms of families that have single-domain or multidomain architectures and whether they have a known three-dimensional structure. Growth of new single-domain families is very slow: Almost all growth comes from new multidomain architectures that are combinations of domains characterized by approximate to 15,000 sequence profiles. Single-domain families are mostly shared by the major groups of organisms, whereas multidomain architectures are specific and account for species diversity. There are known structures for a quarter of the single-domain families, and > 70% of all sequences can be partially modeled thanks to their membership in these families.
引用
收藏
页码:11079 / 11084
页数:6
相关论文
共 39 条
[21]   Docking and scoring protein complexes: CAPRI 3rd edition [J].
Lensink, Marc F. ;
Mendez, Raul ;
Wodak, Shoshana J. .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 69 (04) :704-718
[22]   Growth of novel protein structural data [J].
Levitt, Michael .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (09) :3183-3188
[23]   Clustering of highly homologous sequences to reduce the size of large protein databases [J].
Li, WZ ;
Jaroszewski, L ;
Godzik, A .
BIOINFORMATICS, 2001, 17 (03) :282-283
[24]   CDD: a curated Entrez database of conserved domain alignments [J].
Marchler-Bauer, A ;
Anderson, JB ;
DeWeese-Scott, C ;
Fedorova, ND ;
Geer, LY ;
He, SQ ;
Hurwitz, DI ;
Jackson, JD ;
Jacobs, AR ;
Lanczycki, CJ ;
Liebert, CA ;
Liu, CL ;
Madej, T ;
Marchler, GH ;
Mazumder, R ;
Nikolskaya, AN ;
Panchenko, AR ;
Rao, BS ;
Shoemaker, BA ;
Simonyan, V ;
Song, JS ;
Thiessen, PA ;
Vasudevan, S ;
Wang, YL ;
Yamashita, RA ;
Yin, JJ ;
Bryant, SH .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :383-387
[25]  
MURZIN AG, 1995, J MOL BIOL, V247, P536, DOI 10.1016/S0022-2836(05)80134-2
[26]   Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods [J].
Park, J ;
Karplus, K ;
Barrett, C ;
Hughey, R ;
Haussler, D ;
Hubbard, T ;
Chothia, C .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 284 (04) :1201-1210
[27]   Pfam 10 years on: 10 000 families and still growing [J].
Sammut, Stephen John ;
Finn, Robert D. ;
Bateman, Alex .
BRIEFINGS IN BIOINFORMATICS, 2008, 9 (03) :210-219
[28]   THE ARRANGEMENT OF AMINO ACIDS IN PROTEINS [J].
SANGER, F .
ADVANCES IN PROTEIN CHEMISTRY, 1952, 7 :1-67
[29]   SMART, a simple modular architecture research tool: Identification of signaling domains [J].
Schultz, J ;
Milpetz, F ;
Bork, P ;
Ponting, CP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :5857-5864
[30]  
Service RF, 2008, SCIENCE, V319, P1610, DOI [10.1126/science.319.5870.1610, 10.1126/science.319.5870.1612]