Which method predicts recidivism best?: a comparison of statistical, machine learning and data mining predictive models

被引:70
作者
Tollenaar, N. [1 ]
van der Heijden, P. G. M. [2 ]
机构
[1] Minist Secur & Justice, The Hague, Netherlands
[2] Univ Utrecht, NL-3508 TC Utrecht, Netherlands
关键词
Data mining; Linear discriminant analysis; Logistic regression; Machine learning; Prediction; Predictive performance; Recidivism; CLASSIFICATION; PERFORMANCE; REGRESSION; VIOLENCE;
D O I
10.1111/j.1467-985X.2012.01056.x
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Using criminal population conviction histories of recent offenders, prediction mod els are developed that predict three types of criminal recidivism: general recidivism, violent recidivism and sexual recidivism. The research question is whether prediction techniques from modern statistics, data mining and machine learning provide an improvement in predictive performance over classical statistical methods, namely logistic regression and linear discrim inant analysis. These models are compared on a large selection of performance measures. Results indicate that classical methods do equally well as or better than their modern counterparts. The predictive performance of the various techniques differs only slightly for general and violent recidivism, whereas differences are larger for sexual recidivism. For the general and violent recidivism data we present the results of logistic regression and for sexual recidivism of linear discriminant analysis.
引用
收藏
页码:565 / 584
页数:20
相关论文
共 47 条
  • [31] A Comparison of Logistic Regression, Classification and Regression Tree, and Neural Networks Models in Predicting Violent Re-Offending
    Liu, Yuan Y.
    Yang, Min
    Ramsay, Malcolm
    Li, Xiao S.
    Coid, Jeremy W.
    [J]. JOURNAL OF QUANTITATIVE CRIMINOLOGY, 2011, 27 (04) : 547 - 573
  • [32] Maden A., 2005, ASSESSING UTILITY OF
  • [33] ASSESSING PREDICTIONS OF VIOLENCE - BEING ACCURATE ABOUT ACCURACY
    MOSSMAN, D
    [J]. JOURNAL OF CONSULTING AND CLINICAL PSYCHOLOGY, 1994, 62 (04) : 783 - 792
  • [34] GENERALIZED LINEAR MODELS
    NELDER, JA
    WEDDERBURN, RW
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-GENERAL, 1972, 135 (03): : 370 - +
  • [35] Platt JC, 2000, ADV NEUR IN, P61
  • [36] Pournelle G. H., 1953, Journal of Mammalogy, V34, P133, DOI 10.1890/0012-9658(2002)083[1421:SDEOLC]2.0.CO
  • [37] 2
  • [38] Robust classification for imprecise environments
    Provost, F
    Fawcett, T
    [J]. MACHINE LEARNING, 2001, 42 (03) : 203 - 231
  • [39] The interpretation of diagnostic tests
    Shapiro, DE
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 1999, 8 (02) : 113 - 134
  • [40] ROCR: visualizing classifier performance in R
    Sing, T
    Sander, O
    Beerenwinkel, N
    Lengauer, T
    [J]. BIOINFORMATICS, 2005, 21 (20) : 3940 - 3941