Oblique decision trees for spatial pattern detection: Optimal algorithm and application to malaria risk

被引:5
作者
Gaudart J. [1 ]
Poudiougou B. [2 ,3 ]
Ranque S. [2 ]
Doumbo O. [3 ]
机构
[1] Medical Statistics and Informatics Research Team, LIF - UMR 6166 - CNRS/Aix-Marseille University, Faculty of Medicine, 13385 Marseille Cedex 05
[2] Immunology and Genetic of Parasitic Diseases, UMR 399 - INSERM/Aix-Marseille University, Faculty of Medicine, 13385 Marseille Cedex 05
[3] Malaria Research and Training Centre, Faculty of Medicine, Pharmacy and Odonto-Stomatology, University of Mali, Bamako
关键词
Malaria; Potential Cluster; Angular Sector; Circular Window; High Risk Class;
D O I
10.1186/1471-2288-5-22
中图分类号
学科分类号
摘要
Background: In order to detect potential disease clusters where a putative source cannot be specified, classical procedures scan the geographical area with circular windows through a specified grid imposed to the map. However, the choice of the windows' shapes, sizes and centers is critical and different choices may not provide exactly the same results. The aim of our work was to use an Oblique Decision Tree model (ODT) which provides potential clusters without pre-specifying shapes, sizes or centers. For this purpose, we have developed an ODT-algorithm to find an oblique partition of the space defined by the geographic coordinates. Methods: ODT is based on the classification and regression tree (CART). As CART finds out rectangular partitions of the covariate space, ODT provides oblique partitions maximizing the interclass variance of the independent variable. Since it is a NP-Hard problem in R N, classical ODT-algorithms use evolutionary procedures or heuristics. We have developed an optimal ODT-algorithm in R2, based on the directions defined by each couple of point locations. This partition provided potential clusters which can be tested with Monte-Carlo inference. We applied the ODT-model to a dataset in order to identify potential high risk clusters of malaria in a village in Western Africa during the dry season. The ODT results were compared with those of the Kulldorff' s SaTScan™. Results: The ODT procedure provided four classes of risk of infection. In the first high risk class 60%, 95% confidence interval (CI95%) [52.22-67.55], of the children was infected. Monte-Carlo inference showed that the spatial pattern issued from the ODT-model was significant (p < 0.0001). Satscan results yielded one significant cluster where the risk of disease was high with an infectious rate of 54.21%, CI95% [47.51-60.75]. Obviously, his center was located within the first high risk ODT class. Both procedures provided similar results identifying a high risk cluster in the western part of the village where a mosquito breeding point was located. Conclusion: ODT-models improve the classical scanning procedures by detecting potential disease clusters independently of any specification of the shapes, sizes or centers of the clusters. © 2005 Gaudart et al; licensee BioMed Central Ltd.
引用
收藏
相关论文
共 45 条
[1]  
Kulldorff M., Feuer E.J., Miller B.A., Freeman L.S., Breast cancer in northeastern United States: A geographical analysis, Am J Epidemiol, 146, pp. 161-170, (1997)
[2]  
Bithell J.F., The choice of test for detecting raised disease risk near a point source, Stat Med, 14, pp. 2309-2322, (1995)
[3]  
Cuzick J., Edwards R., Spatial clustering for inhomogeneous populations, J R Stat Soc [Ser B], 52, pp. 73-104, (1990)
[4]  
Tango T., A class of tests for detecting 'general' and 'focused' clustering of rare diseases, Stat Med, 14, pp. 2323-2334, (1995)
[5]  
Diggle P.J., Morris S., Elliott P., Shaddick G., Regression modelling of disease risk in relation to point sources, J R Stat Soc [Ser A], 160, pp. 491-505, (1997)
[6]  
Anderson N.H., Titterington D.M., Some methods for investigating spatial clustering, with epidemiological applications, J R Stat Soc [Ser A], 160, pp. 87-105, (1997)
[7]  
Tango T., Score tests for detecting excess risks around putative sources, Stat Med, 21, pp. 497-514, (2002)
[8]  
Diggle P.J., Chetwynd A.G., Second-order analysis of spatial clustering for inhomogeneous populations, Biometrics, 47, pp. 1155-1163, (1991)
[9]  
Gomez-Rubio V., Ferrandiz J., Lopez A., Detecting clusters of diseases with R, Proceedings of the 3rd International Workshop on Distributed Statistical Computing: March 20-22 2003, (2003)
[10]  
Turnbull B.W., Iwano E.J., Burnett W.S., Howe H.L., Clark L.C., Monitoring for clusters of disease: Application to leukemia incidence in upstate New York, Am J Epidemiol, 132, (1990)