Missing data imputation, matching and other applications of random recursive partitioning

被引:26
作者
Iacus, Stefano A. [1 ]
Porro, Giuseppe [2 ]
机构
[1] Univ Milan, Dept Econ Business & Stat, I-20122 Milan, Italy
[2] Univ Trieste, Dept Econ & Stat, I-34127 Trieste, Italy
关键词
recursive partitioning; average treatment effect estimation; classification; missing data imputation;
D O I
10.1016/j.csda.2006.12.036
中图分类号
TP39 [计算机的应用];
学科分类号
081203 [计算机应用技术]; 0835 [软件工程];
摘要
Applications of the random recursive partitioning (RRP) method are described. This method generates a proximity matrix which can be used in non-parametric matching problems such as hot-deck missing data imputation and average treatment effect estimation. RRP is a Monte Carlo procedure that randomly generates non-empty recursive partitions of the data and calculates the proximity between observations as the empirical frequency in the same cell of these random partitions over all the replications. Also, the method in the presence of missing data is invariant under monotonic transformations of the data but no other formal properties of the method are known yet. Therefore, Monte Carlo experiments were conducted in order to explore the performance of the method. A companion software is available as a package for the R statistical environment. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:773 / 789
页数:17
相关论文
共 32 条
[1]
ANDERSON E, 1935, B AM IRIS SOC, V59, P25
[2]
[Anonymous], 2005, GENETIC MATCHING EST
[3]
SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[4]
BONEK E, 2004, P 13 IST MOB WIR COM
[5]
BONEK E, 2005, P VTC 2005 SPRING ST, P1
[6]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]
Breiman L., 2002, Manual on setting up, using, and understanding random forests
[8]
Nearest-neighbor classification with categorical variables [J].
Buttrey, SE .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1998, 28 (02) :157-169
[9]
CHIHCHUNG C, 1990, LIBSVM LIB SUPPORT V
[10]
Practical propensity score matching: a reply to Smith and Todd [J].
Dehejia, R .
JOURNAL OF ECONOMETRICS, 2005, 125 (1-2) :355-364