The Split-Apply-Combine Strategy for Data Analysis

被引:1953
作者
Wickham, Hadley [1 ]
机构
[1] Rice Univ, Houston, TX 77251 USA
来源
JOURNAL OF STATISTICAL SOFTWARE | 2011年 / 40卷 / 01期
关键词
R; apply; split; data analysis;
D O I
10.18637/jss.v040.i01
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Many data analysis problems involve the application of a split-apply-combine strategy, where you break up a big problem into manageable pieces, operate on each piece independently and then put all the pieces back together. This insight gives rise to a new R package that allows you to smoothly apply this strategy, without having to worry about the type of structure in which your data is stored. The paper includes two case studies showing how these insights make it easier to work with batting records for veteran baseball players and a large 3d array of spatio-temporal ozone measurements.
引用
收藏
页码:1 / 29
页数:29
相关论文
共 16 条
[1]  
BERGSMA T, 2007, R PACKAGE VERSION 2
[2]  
Dalgaard P., 2001, R News, V1, P27
[3]  
FRIENDLY M, 1994, J COMPUTATIONAL GRAP, V3, P387
[4]  
GROTHENDIECK G, 2010, R PACKAGE VERSION 0
[5]   Glaciers melt as mountains warm: a graphical case study [J].
Hobbs, J. ;
Wickham, H. ;
Hofmann, H. ;
Cook, D. .
COMPUTATIONAL STATISTICS, 2010, 25 (04) :569-586
[6]  
Hojsgaard S., 2011, R package version 4
[7]  
HOJSGAARD S, 2006, R NEWS, V6, P47
[8]  
PLATE T, 2011, R PACKAGE VERSION 1
[9]  
Pournelle G. H., 1953, Journal of Mammalogy, V34, P133, DOI 10.1890/0012-9658(2002)083[1421:SDEOLC]2.0.CO
[10]  
2