A Study of Practical Deduplication

被引:309
作者
Meyer, Dutch T. [1 ]
Bolosky, William J. [1 ]
机构
[1] Univ British Columbia, Microsoft Res, Vancouver, BC V5Z 1M9, Canada
关键词
Measurement; Performance; Deduplication; Windows; filesystem; data; study;
D O I
10.1145/2078861.2078864
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
We collected file system content data from 857 desktop computers at Microsoft over a span of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication, particularly considering whole-file versus block-level elimination of redundancy. We found that whole-file deduplication achieves about three quarters of the space savings of the most aggressive block-level deduplication for storage of live file systems, and 87% of the savings for backup images. We also studied file fragmentation, finding that it is not prevalent, and updated prior file system metadata studies, finding that the distribution of file sizes continues to skew toward very large unstructured files.
引用
收藏
页数:20
相关论文
共 33 条
[1]
AGRAWAL N., 2007, P 5 USENIX C FIL STO
[2]
[Anonymous], 1992, MD5 MESSAGE DIGEST A
[3]
[Anonymous], 2009, P USENIX ANN TECHNIC
[4]
[Anonymous], 1996, P USENIX ANN TECHNIC
[5]
[Anonymous], P 7 USENIX C FIL STO
[6]
[Anonymous], 1981, TRCSE0301 HARV U CTR
[7]
[Anonymous], 1997, WINDOWS NT FILE SYST
[8]
[Anonymous], 2008, FAST
[9]
[Anonymous], P USENIX ANN TECHN C
[10]
[Anonymous], 2011, PROC USENIX C FILE S