A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

被引:39
作者
Mahmoud, Osama [1 ,3 ]
Harrison, Andrew [1 ]
Perperoglou, Aris [1 ]
Gul, Asma [1 ]
Khan, Zardad [1 ]
Metodiev, Metodi V. [2 ]
Lausen, Berthold [1 ]
机构
[1] Univ Essex, Dept Math Sci, Colchester CO4 3SQ, Essex, England
[2] Univ Essex, Prote Unit, Sch Biol Sci, Colchester CO4 3SQ, Essex, England
[3] Helwan Univ, Dept Appl Stat, Cairo, Egypt
基金
英国经济与社会研究理事会;
关键词
Feature selection; Gene ranking; Microarray classification; Proportional overlap score; Gene mask; Minimum subset of genes; MICROARRAY DATA; GENE SELECTION; EXPRESSION; CANCER; IDENTIFICATION; PREDICTION;
D O I
10.1186/1471-2105-15-274
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task. Results: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance. Conclusions: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes.
引用
收藏
页数:20
相关论文
共 42 条
[1]
Candidate driver genes in microsatellite-unstable colorectal cancer [J].
Alhopuro, Pia ;
Sammalkorpi, Heli ;
Niittymaki, Iina ;
Bistrom, Mia ;
Raitila, Anniina ;
Saharinen, Juha ;
Nousiainen, Kari ;
Lehtonen, Heli J. ;
Heliovaara, Elina ;
Puhakka, Jani ;
Tuupanen, Sari ;
Sousa, Sonia ;
Seruca, Raquel ;
Ferreira, Ana M. ;
Hofstra, Robert M. W. ;
Mecklin, Jukka-Pekka ;
Jarvinen, Heikki ;
Ristimaki, Ari ;
Orntoft, Torben F. ;
Hautaniemi, Sampsa ;
Arango, Diego ;
Karhu, Auli ;
Aaltonen, Lauri A. .
INTERNATIONAL JOURNAL OF CANCER, 2012, 130 (07) :1558-1566
[2]
DANGERS OF USING OPTIMAL CUTPOINTS IN THE EVALUATION OF PROGNOSTIC FACTORS [J].
ALTMAN, DG ;
LAUSEN, B ;
SAUERBREI, W ;
SCHUMACHER, M .
JOURNAL OF THE NATIONAL CANCER INSTITUTE, 1994, 86 (11) :829-835
[3]
The painter's feature selection for gene expression data [J].
Apiletti, Daniele ;
Baralis, Elena ;
Bruno, Giulia ;
Fiori, Alessandro .
2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, :4227-4230
[4]
MaskedPainter: Feature selection for microarray data analysis [J].
Apiletti, Daniele ;
Baralis, Elena ;
Bruno, Giulia ;
Fiori, Alessandro .
INTELLIGENT DATA ANALYSIS, 2012, 16 (04) :717-737
[5]
Minimum Number of Genes for Microarray Feature Selection [J].
Baralis, Elena ;
Bruno, Giulia ;
Fiori, Alessandro .
2008 30TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-8, 2008, :5692-5695
[6]
SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[7]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]
Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm [J].
Chen, Kun-Huang ;
Wang, Kung-Jeng ;
Tsai, Min-Lung ;
Wang, Kung-Min ;
Adrian, Angelia Melani ;
Cheng, Wei-Chung ;
Yang, Tzu-Sen ;
Teng, Nai-Chia ;
Tan, Kuo-Pin ;
Chang, Ku-Shang .
BMC BIOINFORMATICS, 2014, 15
[9]
Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival [J].
Chiaretti, S ;
Li, XC ;
Gentleman, R ;
Vitale, A ;
Vignetti, M ;
Mandelli, F ;
Ritz, J ;
Foa, R .
BLOOD, 2004, 103 (07) :2771-2778
[10]
SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297