Comparisons of likelihood and machine learning methods of individual classification

被引:30
作者
Guinand, B
Topchy, A
Page, KS
Burnham-Curtis, MK
Punch, WF
Scribner, KT [1 ]
机构
[1] Michigan State Univ, Dept Wildlife & Fisheries, E Lansing, MI 48824 USA
[2] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
[3] USGS Great Lakes Sci Ctr, Ann Arbor, MI 48105 USA
[4] Natl Fish & Wildlife Forens Lab, US Fish & Wildlife Serv, Ashland, OR 97520 USA
关键词
D O I
10.1093/jhered/93.4.260
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Classification methods used in machine learning (e.g., artificial neural networks, decision trees, and k-nearest neighbor clustering) are rarely used with population genetic data. We compare different nonparametric machine learning techniques with parametric likelihood estimations commonly employed in population genetics for purposes of assigning individuals to their population of origin ("assignment tests"). Classifier accuracy was compared across simulated data sets representing different levels of population differentiation (low and high F-ST), number of loci surveyed (5 and 10), and allelic diversity (average of three or eight alleles per locus). Empirical data for the lake trout (Salvelinus namaycush) exhibiting levels of population differentiation comparable to those used in simulations were examined to further evaluate and compare classification methods. Classification error rates associated with artificial neural networks and likelihood estimators were lower for simulated data sets compared to k-nearest neighbor and decision tree classifiers over the entire range of parameters considered. Artificial neural networks only marginally outperformed the likelihood method for simulated data (0-2.8% lower error rates). The relative performance of each machine learning classifier improved relative likelihood estimators for empirical data sets, suggesting an ability to "learn" and utilize properties of empirical genotypic arrays intrinsic to each population. Likelihood-based estimation methods provide a more accessible option for reliable assignment of individuals to the population of origin due to the intricacies in development and evaluation of artificial neural networks.
引用
收藏
页码:260 / 269
页数:10
相关论文
共 109 条
[1]  
Almudevar A, 2000, CAN J STAT, V28, P81
[2]  
[Anonymous], MACHINE LEARNING MET
[3]   Microsatellites and artificial neural networks:: tools for the discrimination between natural and hatchery brown trout (Salmo trutta, L.) in Atlantic populations [J].
Aurelle, D ;
Lek, S ;
Giraudel, JL ;
Berrebi, P .
ECOLOGICAL MODELLING, 1999, 120 (2-3) :313-324
[4]  
Aurelle D, 1999, THESIS U MONTPELLIER
[5]  
Beacham TD, 1999, T AM FISH SOC, V128, P1068, DOI 10.1577/1548-8659(1999)128&lt
[6]  
1068:PSASIO&gt
[7]  
2.0.CO
[8]  
2
[9]   Genetic diversity and introgression in the Scottish wildcat [J].
Beaumont, M ;
Barratt, EM ;
Gottelli, D ;
Kitchener, AC ;
Daniels, MJ ;
Pritchard, JK ;
Bruford, MW .
MOLECULAR ECOLOGY, 2001, 10 (02) :319-336
[10]   Application of classification trees to the habitat preference of upland birds [J].
Bell, JF .
JOURNAL OF APPLIED STATISTICS, 1996, 23 (2-3) :349-359