Annotation transfer between genomes: Protein-protein interologs and protein-DNA regulogs

被引:396
作者
Yu, HY
Luscombe, NM
Lu, HX
Zhu, XW
Xia, Y
Han, JDJ
Bertin, N
Chung, S
Vidal, M
Gerstein, M [1 ]
机构
[1] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[2] Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
[3] Harvard Univ, Sch Med, Dana Farber Canc Inst, Boston, MA 02115 USA
关键词
D O I
10.1101/gr.1774904
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Proteins function mainly through interactions, especially with DNA and other proteins. While some large-scale interaction networks are now available for a number of model organisms, their experimental generation remains difficult. Consequently, interolog mapping-the transfer of interaction annotation from one organism to another using comparative genomics-is of significant value. Here we quantitatively assess the degree to which interologs can be reliably transferred between species as a function of the sequence similarity of the corresponding interacting proteins. Using interaction information from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, and Helicobacter pylori, we find that protein-protein interactions can be transferred when a pair of proteins has a joint sequence identity >80% or a joint E-value <10(-70). (These "joint" quantities are the geometric means of the identities or E-values for the two pairs of interacting proteins.) We generalize Our interolog analysis to protein-DNA binding, finding such interactions are conserved at specific thresholds between 30% and 60% Sequence identity depending oil the protein family. Furthermore, we introduce the concept of a "regulog"-a conserved regulatory relationship between proteins across different species. We map interologs and regulogs from yeast to a number of genomes with limited experimental annotation (e.g., Arabidopsis thaliana) and make these available through ail online database at http://interolog.gersteinlab.org. Specifically, we are able to transfer -90,000 potential protein-protein interactions to the worm. We test a number of these in two-hybrid experiments and are able to verify 45 overlaps, which we show to be statistically significant.
引用
收藏
页码:1107 / 1118
页数:12
相关论文
共 54 条
[51]   The Protein Data Bank and structural genomics [J].
Westbrook, J ;
Feng, ZK ;
Chen, L ;
Yang, HW ;
Berman, HM .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :489-491
[52]   Assessing annotation transfer for genomics: Quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores [J].
Wilson, CA ;
Kreychman, J ;
Gerstein, M .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 297 (01) :233-249
[53]   The TRANSFAC system on gene expression regulation [J].
Wingender, E ;
Chen, X ;
Fricke, E ;
Geffers, R ;
Hehl, R ;
Liebich, I ;
Krull, M ;
Matys, V ;
Michael, H ;
Ohnhäuser, R ;
Prüss, M ;
Schacherer, F ;
Thiele, S ;
Urbach, S .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :281-283
[54]   DIP: the Database of Interacting Proteins [J].
Xenarios, I ;
Rice, DW ;
Salwinski, L ;
Baron, MK ;
Marcotte, EM ;
Eisenberg, D .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :289-291