Predicting protein cellular localization using a domain projection method

被引:82
作者
Mott, R [1 ]
Schultz, J
Bork, P
Ponting, CP
机构
[1] Wellcome Trust Ctr Human Genet, Oxford OX3 7BN, England
[2] Max Planck Inst Mol Genet, D-14195 Berlin, Germany
[3] European Mol Biol Lab, D-69012 Heidelberg, Germany
[4] Max Delbruk Ctr Berlin Buch, D-13092 Berlin, Germany
[5] Univ Oxford, Dept Human Anat & Genet, MRC, Funct Genet Unit, Oxford OX1 3QX, England
关键词
D O I
10.1101/gr.96802
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We investigate the co-occurrence of domain families in eukaryotic proteins to predict protein cellular localization. Approximately half (300) of SMART domains form a "small-world network", linked by no more than seven degrees of separation. Projection of the domains onto two-dimensional space reveals three clusters that correspond to cellular compartments containing secreted, cytoplasmic, and nuclear proteins. The projection method takes into account the existence of "bridging" domains, that is, instances where two domains might not occur with each other but frequently co-occur with a third domain; in such circumstances the domains are neighbors in the projection. While the majority of domains are specific to a compartment ("locale"), and hence may be used to localize any protein that contains such a domain, a small subset of domains either are present in multiple locales or occur in transmembrane proteins. Comparison with previously annotated proteins shows that SMART domain data used with this approach can predict, with 92% accuracy, the localizations of 23% of eukaryotic proteins. The coverage and accuracy will increase with improvements in domain database coverage. This method is complementary to approaches that use amino-acid composition or identify sorting sequences; these methods may be combined to further enhance prediction accuracy.
引用
收藏
页码:1168 / 1174
页数:7
相关论文
共 31 条
[1]   InterPro - an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, L ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
BIOINFORMATICS, 2000, 16 (12) :1145-1150
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[4]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[5]   The apoptosis inhibitor gene API2 and a novel 18q gene, MLT, are recurrently rearranged in the t(11;18)(q21;q21) associated with mucosa-associated lymphoid tissue lymphomas [J].
Dierlamm, J ;
Baens, M ;
Wlodarska, I ;
Stefanova-Ouzounova, M ;
Hernandez, JM ;
Hossfeld, DK ;
De Wolf-Peeters, C ;
Hagemeijer, A ;
Van den Berghe, H ;
Marynen, P .
BLOOD, 1999, 93 (11) :3601-3609
[6]  
Dijikstra E. W., 1959, NUMER MATH, V1, P269, DOI DOI 10.1007/BF01386390
[7]   A Bayesian system integrating expression data with sequence patterns for localizing proteins: Comprehensive application to the yeast genome [J].
Drawid, A ;
Gerstein, M .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 301 (04) :1059-1075
[8]   Wanted: subcellular localization of proteins based on sequence [J].
Eisenhaber, F ;
Bork, P .
TRENDS IN CELL BIOLOGY, 1998, 8 (04) :169-170
[9]   Evaluation of human-readable annotation in biomolecular sequence databases with biological rule libraries [J].
Eisenhaber, F ;
Bork, P .
BIOINFORMATICS, 1999, 15 (7-8) :528-535
[10]   ALGORITHM-97 - SHORTEST PATH [J].
FLOYD, RW .
COMMUNICATIONS OF THE ACM, 1962, 5 (06) :345-345