Joint feature selection and classification for taxonomic problems within fish species complexes

被引:3
作者
Chen, Yixin [1 ]
Huang, Shuqing [2 ]
Chen, Huimin [3 ]
Bart, Henry L., Jr. [4 ,5 ]
机构
[1] Univ Mississippi, Dept Comp & Informat Sci, University, MS 38677 USA
[2] Gen Dynam Corp, New Orleans, LA 70123 USA
[3] Univ New Orleans, Dept Elect Engn, New Orleans, LA 70148 USA
[4] Tulane Univ, Dept Ecol & Evolutionary Biol, New Orleans, LA 70118 USA
[5] Tulane Univ, Museum Nat Hist, Belle Chasse, LA 70037 USA
基金
美国国家科学基金会;
关键词
Feature selection; False discovery rate; Logistic regression; Taxonomy; Systematics; IMPEDIMENT;
D O I
10.1007/s10044-009-0157-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is estimated that 90% of the world's species are yet to be discovered and described. The main reason for the slow pace of new species description is that the science of taxonomy can be very laborious. To formally describe a new species, taxonomists have to manually gather and analyze data from large numbers of specimens and identify the smallest subset of external body characters that uniquely diagnose the new species as distinct from all its known relatives. In this paper, we present an automated feature selection and classification scheme using logistic regression with controlled false discovery rate to address the taxonomic research need impediment in new species discovery. Unlike traditional taxonomic practice, our scheme automatically selects body shape features from specimen samples with landmarks that unite populations within species, as well as distinguishing among species. It also provides probabilistic assessment of the classification accuracy using the selected features in identifying new species. We apply the scheme to a taxonomic problem involving species of suckers in the genus Carpiodes. The results confirm the necessity of feature selection for classifier design and provide additional insight on the suspicious specimens which have traditionally been misdiagnosed as C. carpio but are in fact more close to C. cyprinus. We also compare the classification accuracy of our scheme with several well-known machine learning algorithms without and with feature selection.
引用
收藏
页码:23 / 34
页数:12
相关论文
共 27 条
[1]   Adapting to unknown sparsity by controlling the false discovery rate [J].
Abramovich, Felix ;
Benjamini, Yoav ;
Donoho, David L. ;
Johnstone, Iain M. .
ANNALS OF STATISTICS, 2006, 34 (02) :584-653
[2]   Geometric morphometrics: ten years of progress following the 'revolution' [J].
Adams, DC ;
Rohlf, FJ ;
Slice, DE .
ITALIAN JOURNAL OF ZOOLOGY, 2004, 71 (01) :5-16
[3]  
BART HL, 2007, MOLEVOL UNPUB
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]  
Bi J., 2003, Journal of Machine Learning Research, V3, P1229, DOI 10.1162/153244303322753643
[6]  
Bookstein F. L., 1997, MORPHOMETRIC TOOLS L
[7]   Atomic decomposition by basis pursuit [J].
Chen, SSB ;
Donoho, DL ;
Saunders, MA .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) :33-61
[8]   Pairwise feature evaluation for constructing reduced representations [J].
Harol, Artsiom ;
Lai, Carmen ;
Pezkalska, Elzbieta ;
Duin, Robert P. W. .
PATTERN ANALYSIS AND APPLICATIONS, 2007, 10 (01) :55-68
[10]  
KOHAVI R, 1997, ARTIF INTELL, V273, P324