Logic based methods for SNPs tagging and reconstruction

被引:6
作者
Bertolazzi, Paola [1 ]
Felici, Giovanni [1 ]
Festa, Paola [2 ]
机构
[1] CNR, Ist Anal Sistemi & Informat, I-00185 Rome, Italy
[2] Univ Naples Federico II, Dipartimento Matemat & Applicaz R Caccioppoli, Compl MSA, I-80126 Naples, Italy
关键词
Tag SNPs selection; Feature selection; Set covering heuristics; Logic programming; SELECTION; BLOCKS; SET;
D O I
10.1016/j.cor.2009.10.008
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
SNPs are positions of the DNA sequences where the differences among individuals are embedded. The knowledge of such SNPs is crucial for disease association studies, but even if the number of such positions is low (about 1% of the entire sequence), the cost to extract the complete information is actually very high. Recent studies have shown that DNA sequences are structured into blocks of positions, that are conserved during evolution, where there is strong correlation among values (alleles) of different loci. To reduce the cost of extracting SNPs information, the block structure of the DNA has suggested to limit the process to a subset of SNPs, the so-called Tag SNPs, that are able to maintain the most of the information contained in the whole sequence. In this paper, we apply a technique for feature selection based on integer programming to the problem of Tag SNP selection. Moreover, to test the quality of our approach, we consider also the problem of SNPs reconstruction, i.e. the problem of deriving unknown SNPs from the value of Tag SNPs and propose two reconstruction methods, one based on a majority vote and the other on a machine learning approach. We test our algorithm on two public data sets of different nature, providing results that are, when comparable, in line with the related literature. One of the interesting aspects of the proposed method is to be found in its capability to deal simultaneously with very large SNPs sets, and, in addition, to provide highly informative reconstruction rules in the form of logic formulas. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1419 / 1426
页数:8
相关论文
共 25 条
[1]  
[Anonymous], 1979, COMPUT INTRACTABILIT
[2]  
[Anonymous], ENCY DATA WAREHOUSIN
[3]   Logic classification and feature selection for biomedical data [J].
Bertolazzi, P. ;
Felici, G. ;
Festa, P. ;
Lancia, G. .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (05) :889-899
[4]   Logical analysis of binary data with missing bits [J].
Boros, E ;
Ibaraki, T ;
Makino, K .
ARTIFICIAL INTELLIGENCE, 1999, 107 (02) :219-263
[5]  
BOROS E, 1996, 2996 RUTCOR RUTG U
[6]  
de Angelis V, 2006, DATA MINING AND KNOWLEDGE DISCOVERY APPROACHES BASED ON RULE INDUCTION TECHNIQUES, P227
[7]  
Felici G, 2001, INFORMS J COMPUT, V13, P1
[8]   GREEDY RANDOMIZED ADAPTIVE SEARCH PROCEDURES [J].
FEO, TA ;
RESENDE, MGC .
JOURNAL OF GLOBAL OPTIMIZATION, 1995, 6 (02) :109-133
[9]   A PROBABILISTIC HEURISTIC FOR A COMPUTATIONALLY DIFFICULT SET COVERING PROBLEM [J].
FEO, TA ;
RESENDE, MGC .
OPERATIONS RESEARCH LETTERS, 1989, 8 (02) :67-71
[10]  
Festa P, 2002, OPER RES COMPUT SCI, V15, P325