Feature Selection for Classification of Hyperspectral Data by SVM

被引：635

作者：

Pal, Mahesh ^{[1
]}

Foody, Giles M. ^{[2
]}

机构：

[1] Natl Inst Technol, Dept Civil Engn, Kurukshetra 136119, Haryana, India

[2] Univ Nottingham, Sch Geog, Nottingham NG7 2RD, England

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2010年 / 48卷 / 05期

关键词：

Classification accuracy; feature selection; Hughes phenomenon; hyperspectral data; support vector machines (SVM); REMOTE-SENSING IMAGES; GENE SELECTION; SAMPLE-SIZE; ACCURACY;

D O I：

10.1109/TGRS.2009.2039484

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Support vector machines (SVM) are attractive for the classification of remotely sensed data with some claims that the method is insensitive to the dimensionality of the data and, therefore, does not require a dimensionality-reduction analysis in preprocessing. Here, a series of classification analyses with two hyperspectral sensor data sets reveals that the accuracy of a classification by an SVM does vary as a function of the number of features used. Critically, it is shown that the accuracy of a classification may decline significantly (at 0.05 level of statistical significance) with the addition of features, particularly if a small training sample is used. This highlights a dependence of the accuracy of classification by an SVM on the dimensionality of the data and, therefore, the potential value of undertaking a feature-selection analysis prior to classification. Additionally, it is demonstrated that, even when a large training sample is available, feature selection may still be useful. For example, the accuracy derived from the use of a small number of features may be non-inferior (at 0.05 level of significance) to that derived from the use of a larger feature set providing potential advantages in relation to issues such as data storage and computational processing costs. Feature selection may, therefore, be a valuable analysis to include in preprocessing operations for classification by an SVM.

引用

页码：2297 / 2307

页数：11

共 67 条

[1] STATISTICS NOTES - ABSENCE OF EVIDENCE IS NOT EVIDENCE OF ABSENCE [J].

ALTMAN, DG ;

BLAND, JM .

BRITISH MEDICAL JOURNAL, 1995, 311 (7003) :485-485

[2]

[Anonymous], P 8 ANN INT C MAP IN

[3]

[Anonymous], 2007, Hyperspectral data exploitation: theory and applications

[4]

[Anonymous], 1988, NUMERICAL RECIPES

[5]

[Anonymous], 1998, FEATURE EXTRACTION C

[6] On domain knowledge and feature selection using a support vector machine [J].

Barzilay, O ;

Brailovsky, VL .

PATTERN RECOGNITION LETTERS, 1999, 20 (05) :475-484

[7] Feature extraction for multisource data classification with artificial neural networks [J].

Benediktsson, JA ;

Sveinsson, JR .

INTERNATIONAL JOURNAL OF REMOTE SENSING, 1997, 18 (04) :727-740

[8]

BENGIO Y, 2006, ADV NEURAL INFORM PR, V18, P107

[9]

Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401

[10] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

← 1 2 3 4 5 6 7 →