A partially supervised classification approach to dominant and recessive human disease gene prediction

被引:17
作者
Calvo, Borja
Lopez-Bigas, Nuria
Furney, Simon J.
Larranaga, Pedro
Lozano, Jose A.
机构
[1] Univ Basque Country, Dept Comp Sci & Artificial Intelligence, Intelligent Syst Grp, E-20018 San Sebastian, Spain
[2] Univ Pompeu Fabra, Res Unit Biomed Informat, E-08003 Barcelona, Spain
关键词
partially supervised classification; disease gene prediction; dominant disease gene; recessive diseases gene;
D O I
10.1016/j.cmpb.2006.12.003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The discovery of the genes involved in genetic diseases is a very important step towards the understanding of the nature of these diseases. In-lab identification is a difficult, time-consuming task, where computational methods can be very useful. In silico identification algorithms can be used as a guide in future studies. Previous works in this topic have not taken into account that no reliable sets of negative examples are available, as it is not possible to ensure that a given gene is not related to any genetic disease. In this paper, this feature of the nature of the problem is considered, and identification is approached as a partially supervised classification problem. In addition, we have performed a more specific method to identify disease genes by classifying, for the first time, genes causing dominant and recessive diseases independently. We base this separation on previous results that show that these two types of genes present differences in their sequence properties. In this paper, we have applied a new model averaging algorithm to the identification of human genes associated with both dominant and recessive Mendelian diseases. (c) 2006 Elsevier Ireland Ltd. All rights reserved.
引用
收藏
页码:229 / 237
页数:9
相关论文
共 28 条
[1]   Speeding disease gene discovery by sequence based candidate prioritization [J].
Adie, EA ;
Adams, RR ;
Evans, KL ;
Porteous, DJ ;
Pickard, BS .
BMC BIOINFORMATICS, 2005, 6 (1)
[2]  
[Anonymous], 2002, P 19 INT C MACH LEAR
[3]  
[Anonymous], P 12 INT C INF KNOWL
[4]   Building text classifiers using positive and unlabeled examples [J].
Bing, L ;
Yang, D ;
Li, XL ;
Lee, WS ;
Yu, PS .
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :179-186
[5]   Ensembl 2004 [J].
Birney, E ;
Andrews, D ;
Bevan, P ;
Caccamo, M ;
Cameron, G ;
Chen, Y ;
Clarke, L ;
Coates, G ;
Cox, T ;
Cuff, J ;
Curwen, V ;
Cutts, T ;
Down, T ;
Durbin, R ;
Eyras, E ;
Fernandez-Suarez, XM ;
Gane, P ;
Gibbins, B ;
Gilbert, J ;
Hammond, M ;
Hotz, H ;
Iyer, V ;
Kahari, A ;
Jekosch, K ;
Kasprzyk, A ;
Keefe, D ;
Keenan, S ;
Lehvaslaiho, H ;
McVicker, G ;
Melsopp, C ;
Meidl, P ;
Mongin, E ;
Pettett, R ;
Potter, S ;
Proctor, G ;
Rae, M ;
Searle, S ;
Slater, G ;
Smedley, D ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Storey, R ;
Ureta-Vidal, A ;
Woodwark, C ;
Clamp, M ;
Hubbard, T .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D468-D470
[6]  
Blake C.L., 1998, UCI repository of machine learning databases
[7]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[8]  
CASTELO R, 2004, BIOINFORMATICS, V4, P169
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]  
Denis F., 2002, 9 INT C INF PROC MAN