Which method predicts recidivism best?: a comparison of statistical, machine learning and data mining predictive models

被引:70
作者
Tollenaar, N. [1 ]
van der Heijden, P. G. M. [2 ]
机构
[1] Minist Secur & Justice, The Hague, Netherlands
[2] Univ Utrecht, NL-3508 TC Utrecht, Netherlands
关键词
Data mining; Linear discriminant analysis; Logistic regression; Machine learning; Prediction; Predictive performance; Recidivism; CLASSIFICATION; PERFORMANCE; REGRESSION; VIOLENCE;
D O I
10.1111/j.1467-985X.2012.01056.x
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Using criminal population conviction histories of recent offenders, prediction mod els are developed that predict three types of criminal recidivism: general recidivism, violent recidivism and sexual recidivism. The research question is whether prediction techniques from modern statistics, data mining and machine learning provide an improvement in predictive performance over classical statistical methods, namely logistic regression and linear discrim inant analysis. These models are compared on a large selection of performance measures. Results indicate that classical methods do equally well as or better than their modern counterparts. The predictive performance of the various techniques differs only slightly for general and violent recidivism, whereas differences are larger for sexual recidivism. For the general and violent recidivism data we present the results of logistic regression and for sexual recidivism of linear discriminant analysis.
引用
收藏
页码:565 / 584
页数:20
相关论文
共 47 条
  • [1] [Anonymous], CARET CLASSIFICATION
  • [2] [Anonymous], 2009, TECHNICAL REPORT
  • [3] [Anonymous], 1985, Encyclopedia of Statistical Sciences
  • [4] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Breiman L, 1998, TECHNICAL REPORT
  • [7] Caruana R, 2006, ICML 06: proceedings of the 23rd International Conference on Machine Learning, P161, DOI [DOI 10.1145/1143844.1143865, 10.1145/1143844.1143865.]
  • [8] Caruana R., 2004, DATA MINING METRIC S, V69
  • [9] Predicting criminal recidivism: A comparison of neural network models with statistical methods
    Caulkins, J
    Cohen, J
    Gorr, W
    Wei, JF
    [J]. JOURNAL OF CRIMINAL JUSTICE, 1996, 24 (03) : 227 - 240
  • [10] Tree-Based Ranking Methods
    Clemencon, Stephan
    Vayatis, Nicolas
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2009, 55 (09) : 4316 - 4336