Stochastic local search for the FEATURE SET problem, with applications to microarray data

被引:25
作者
Albrecht, Andreas A. [1 ]
机构
[1] Univ Hertfordshire, Sch Comp Sci, Hatfield AL10 9AB, Herts, England
关键词
FEATURE SET problem; parameterized complexity; Stochastic local search; simulated annealing; gene-expression analysis; microarrays;
D O I
10.1016/j.amc.2006.05.128
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We prove a (m/delta)(O(kappa)) (.) n(a) time bound for finding minimum solutions S-min of FEATURE SET problems, where n is the total size of a given FEATURE SET problem, K <= vertical bar S-min vertical bar, m equals the number of non-target features, a is a (relatively small) constant, and 1 - delta is the confidence that the solution is of minimum length. In terms of parameterized complexity of NP-complete problems, our time bound differs from an FPT-type bound by the factor m(O(kappa)) for fixed delta. The algorithm is applied to a prominent microarray dataset: The classification of gene-expression data related to acute myeloid leukaemia (AML) and acute lymphoblastic leukaemia (ALL). From the set of potentially significant features calculated by the algorithm we can identify three genes (D88422, M92287, L09209) that produce zero errors on the test set by using a simple, straightforward evaluation procedure (performing the test on the single gene M84526 produces only one error). (c) 2006 Elsevier Inc. All rights reserved.
引用
收藏
页码:1148 / 1164
页数:17
相关论文
共 25 条
[1]   An epicurean learning approach to gene-expression data classification [J].
Albrecht, A ;
Vinterbo, SA ;
Ohno-Machado, L .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2003, 28 (01) :75-87
[2]  
Albrecht AA, 2004, LECT NOTES COMPUT SC, V3045, P405
[3]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[4]   Feature selection for consistent biclustering via fractional 0-1 programming [J].
Busygin, S ;
Prokopyev, OA ;
Pardalos, PM .
JOURNAL OF COMBINATORIAL OPTIMIZATION, 2005, 10 (01) :7-21
[7]   Biomarker discovery in microarray gene expression data with Gaussian processes [J].
Chu, W ;
Ghahramani, Z ;
Falciani, F ;
Wild, DL .
BIOINFORMATICS, 2005, 21 (16) :3385-3393
[8]   The k-Feature Set problem is W[2]-complete [J].
Cotta, C ;
Moscato, P .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 67 (04) :686-690
[9]  
Davies S., 1994, P AAAI S REL, P41
[10]  
DOWNEY R, 1998, PARAMETERIZED COMPLE