Comparison of Decision Tree Algorithms for Predicting Potential Air Pollutant Emissions with Data Mining Models

被引:39
作者
Birant, D. [1 ]
机构
[1] Dokuz Eylul Univ, Dept Comp Engn, TR-35100 Izmir, Turkey
关键词
air pollution; data mining; classification and prediction; decision support systems; artificial intelligence; NEURAL-NETWORK; QUALITY; SYSTEM; SO2; NO2;
D O I
10.3808/jei.201100186
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Predicting air pollutant emissions from potential industrial installations is important for controlling air pollution and future planning of air quality management. This paper proposes the classification and prediction of the emission levels of industrial air pollutant sources using decision tree technique. It presents the comparison results of many decision tree algorithms (C4.5, CART, NBTree, BFTree, LADTree, REPTree, Random Tree, Random Forest, LMT, FT and Decision Stump) in terms of running time, classification accuracy and applicability. In comparison, six performance metrics were used: classification accuracy, precision, recall, f-measure, mean absolute error and mean squared error. The aim of the study is to determine the best classifier as a data mining model for the prediction of emission levels of the industrial plants as dependent variable from known values of independent variables: the physical region of the plant, the height of the plant, working hours, the height of the stack, the diameter of the stack, the velocity of the waste in the stack, the temperature of the waste in the stack, plume rise, source classification code, control equipment type and emissions method code. In the experimental studies, all these algorithms are applied on the dataset that consists of sulphur oxide emission levels of industrial pollutants in Izmir. According to the results, while C4.5 algorithm has the highest accuracy value, Decision Stump algorithm is the fastest one. The average classification accuracy found as 82.4% empirically shows the benefits of using decision tree technique in the classification and the prediction of emission levels.
引用
收藏
页码:46 / 53
页数:8
相关论文
共 36 条
[1]  
Abdul-Wahab Sabah A., 2008, American Journal of Environmental Sciences, V4, P473, DOI 10.3844/ajessp.2008.473.481
[2]   An Assessment of Meteorological Effects on Air Quality in Windsor, Ontario, Canada - Sensitivity to Temporal Modeling Resolution [J].
Anastassopoulos, A. ;
Nguyen, S. ;
Xu, X. .
JOURNAL OF ENVIRONMENTAL INFORMATICS, 2008, 11 (02) :45-50
[3]  
[Anonymous], METRIKA
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]  
Chen JX, 2007, LECT NOTES COMPUT SC, V4491, P1274
[6]   Air quality prediction in Milan: feed-forward neural networks, pruned neural networks and lazy learning [J].
Corani, G .
ECOLOGICAL MODELLING, 2005, 185 (2-4) :513-529
[7]   Prevision of Industrial SO2 Pollutant Concentration Applying ANNs [J].
Cortina-Januchs, M. G. ;
Barron-Adame, J. M. ;
Vega-Corona, A. ;
Andina, D. .
2009 7TH IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS, VOLS 1 AND 2, 2009, :510-+
[8]  
DAUD NR, 2009, LECT NOTES ELECT ENG, V27, P787, DOI DOI 10.1007/978-0-387-84814-3_79
[9]   Measurement and prediction of ozone levels around a heavily industrialized area: a neural network approach [J].
Elkamel, A ;
Abdul-Wahab, S ;
Bouhamra, W ;
Alper, E .
ADVANCES IN ENVIRONMENTAL RESEARCH, 2001, 5 (01) :47-59
[10]  
Endo Arihito., 2008, Biomedical Soft Computing and Human Sciences, V13, P11