DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases

被引:16
作者
Zhang, Wangshu [1 ,2 ]
Chen, Yong [1 ,2 ]
Sun, Fengzhu [1 ,2 ,3 ]
Jiang, Rui [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Automat, TNLIST, MOE Key Lab Bioinformat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Automat, TNLIST, Bioinformat Div, Beijing 100084, Peoples R China
[3] Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
基金
美国国家科学基金会; 高等学校博士学科点专项科研基金;
关键词
GENOME-WIDE ASSOCIATION; I ALPHA-GENE; SUSCEPTIBILITY LOCI; DATABASE; NETWORK; CANCER; TOOL; IDENTIFICATION; EXPRESSION; PEPTIDE;
D O I
10.1186/1752-0509-5-55
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Domains are basic units of proteins, and thus exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases. Based on this assumption, we propose a Bayesian regression approach named "domainRBF" (domain Rank with Bayes Factor) to prioritize candidate domains for human complex diseases. Results: Using a compiled dataset containing 1,614 associations between 671 domains and 1,145 disease phenotypes, we demonstrate the effectiveness of the proposed approach through three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and genome-wide scan), and we do so in terms of three criteria (precision, mean rank ratio, and AUC score). We further show that the proposed approach is robust to the parameters involved and the underlying domain-domain interaction network through a series of permutation tests. Once having assessed the validity of this approach, we show the possibility of ab initio inference of domain-disease associations and gene-disease associations, and we illustrate the strong agreement between our inferences and the evidences from genome-wide association studies for four common diseases (type 1 diabetes, type 2 diabetes, Crohn's disease, and breast cancer). Finally, we provide a pre-calculated genome-wide landscape of associations between 5,490 protein domains and 5,080 human diseases and offer free access to this resource. Conclusions: The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved. The ab initio inference of domain-disease associations shows strong agreement with the evidence provided by genome-wide association studies. The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.
引用
收藏
页数:21
相关论文
共 79 条
[1]   Speeding disease gene discovery by sequence based candidate prioritization [J].
Adie, EA ;
Adams, RR ;
Evans, KL ;
Porteous, DJ ;
Pickard, BS .
BMC BIOINFORMATICS, 2005, 6 (1)
[2]   Gene prioritization through genomic data fusion [J].
Aerts, S ;
Lambrechts, D ;
Maity, S ;
Van Loo, P ;
Coessens, B ;
De Smet, F ;
Tranchevent, LC ;
De Moor, B ;
Marynen, P ;
Hassan, B ;
Carmeliet, P ;
Moreau, Y .
NATURE BIOTECHNOLOGY, 2006, 24 (05) :537-544
[3]   Guilt by association [J].
Altshuler, D ;
Daly, M ;
Kruglyak, L .
NATURE GENETICS, 2000, 26 (02) :135-137
[4]  
[Anonymous], CLIN CHEM
[5]  
[Anonymous], BMC P S7
[6]  
[Anonymous], GENETIC COMPLEXITY C
[7]  
[Anonymous], TYPE 2 DIABETES OVER
[8]  
[Anonymous], MOST FREQUENT CANC W
[9]  
[Anonymous], 2008, WORLD CANC REPORT
[10]   The Universal Protein Resource (UniProt) in 2010 [J].
Apweiler, Rolf ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Antunes, Ricardo ;
Barrell, Daniel ;
Bely, Benoit ;
Bingley, Mark ;
Binns, David ;
Bower, Lawrence ;
Browne, Paul ;
Chan, Wei Mun ;
Dimmer, Emily ;
Eberhardt, Ruth ;
Fedotov, Alexander ;
Foulger, Rebecca ;
Garavelli, John ;
Huntley, Rachael ;
Jacobsen, Julius ;
Kleen, Michael ;
Laiho, Kati ;
Leinonen, Rasko ;
Legge, Duncan ;
Lin, Quan ;
Liu, Wudong ;
Luo, Jie ;
Orchard, Sandra ;
Patient, Samuel ;
Poggioli, Diego ;
Pruess, Manuela ;
Corbett, Matt ;
di Martino, Giuseppe ;
Donnelly, Mike ;
van Rensburg, Pieter ;
Bairoch, Amos ;
Bougueleret, Lydie ;
Xenarios, Ioannis ;
Altairac, Severine ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D142-D148