A partially supervised classification approach to dominant and recessive human disease gene prediction

被引：17

作者：

Calvo, Borja

Lopez-Bigas, Nuria

Furney, Simon J.

Larranaga, Pedro

Lozano, Jose A.

机构：

[1] Univ Basque Country, Dept Comp Sci & Artificial Intelligence, Intelligent Syst Grp, E-20018 San Sebastian, Spain

[2] Univ Pompeu Fabra, Res Unit Biomed Informat, E-08003 Barcelona, Spain

来源：

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE | 2007年 / 85卷 / 03期

关键词：

partially supervised classification; disease gene prediction; dominant disease gene; recessive diseases gene;

D O I：

10.1016/j.cmpb.2006.12.003

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The discovery of the genes involved in genetic diseases is a very important step towards the understanding of the nature of these diseases. In-lab identification is a difficult, time-consuming task, where computational methods can be very useful. In silico identification algorithms can be used as a guide in future studies. Previous works in this topic have not taken into account that no reliable sets of negative examples are available, as it is not possible to ensure that a given gene is not related to any genetic disease. In this paper, this feature of the nature of the problem is considered, and identification is approached as a partially supervised classification problem. In addition, we have performed a more specific method to identify disease genes by classifying, for the first time, genes causing dominant and recessive diseases independently. We base this separation on previous results that show that these two types of genes present differences in their sequence properties. In this paper, we have applied a new model averaging algorithm to the identification of human genes associated with both dominant and recessive Mendelian diseases. (c) 2006 Elsevier Ireland Ltd. All rights reserved.

引用

页码：229 / 237

页数：9

共 28 条

[1] Speeding disease gene discovery by sequence based candidate prioritization [J].

Adie, EA ;

Adams, RR ;

Evans, KL ;

Porteous, DJ ;

Pickard, BS .

BMC BIOINFORMATICS, 2005, 6 (1)

[2]

[Anonymous], 2002, P 19 INT C MACH LEAR

[3]

[Anonymous], P 12 INT C INF KNOWL

[4] Building text classifiers using positive and unlabeled examples [J].

Bing, L ;

Yang, D ;

Li, XL ;

Lee, WS ;

Yu, PS .

THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :179-186

[5] Ensembl 2004 [J].

Birney, E ;

Andrews, D ;

Bevan, P ;

Caccamo, M ;

Cameron, G ;

Chen, Y ;

Clarke, L ;

Coates, G ;

Cox, T ;

Cuff, J ;

Curwen, V ;

Cutts, T ;

Down, T ;

Durbin, R ;

Eyras, E ;

Fernandez-Suarez, XM ;

Gane, P ;

Gibbins, B ;

Gilbert, J ;

Hammond, M ;

Hotz, H ;

Iyer, V ;

Kahari, A ;

Jekosch, K ;

Kasprzyk, A ;

Keefe, D ;

Keenan, S ;

Lehvaslaiho, H ;

McVicker, G ;

Melsopp, C ;

Meidl, P ;

Mongin, E ;

Pettett, R ;

Potter, S ;

Proctor, G ;

Rae, M ;

Searle, S ;

Slater, G ;

Smedley, D ;

Smith, J ;

Spooner, W ;

Stabenau, A ;

Stalker, J ;

Storey, R ;

Ureta-Vidal, A ;

Woodwark, C ;

Clamp, M ;

Hubbard, T .

NUCLEIC ACIDS RESEARCH, 2004, 32 :D468-D470

[6]

Blake C.L., 1998, UCI repository of machine learning databases

[7] Bagging predictors [J].

Breiman, L .

MACHINE LEARNING, 1996, 24 (02) :123-140

[8]

CASTELO R, 2004, BIOINFORMATICS, V4, P169

[9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[10]

Denis F., 2002, 9 INT C INF PROC MAN

← 1 2 3 →