Web Archive信息采集流程及关键问题研究

被引：13

作者：

刘兰 ^{[1
,2
]}

吴振新 ^{[1
]}

机构：

[1] 中国科学院国家科学图书馆

[2] 中国科学院研究生院

来源：

关键词：

互联网; 网络存档; 信息采集; 采集流程;

D O I：

10.16353/j.cnki.1000-7490.2009.08.004

中图分类号：

G203 [信息资源及其管理];

学科分类号：

1204 ; 1402 ;

摘要：

通过对国际网络存档项目和系统的调研,把网络信息采集的基本流程归纳为选择、征求所有者许可、实施采集、抽取元数据、质量审核和网络存档等6个部分,并对采集流程中存在的关键问题进行识别和分析。

引用

页码：113 / 117

页数：5

共 8 条

[1] The Web curator tool. http://webcurator.sourceforge.net/ . 2008
[2] Web archiving service:release1guide. https://wiki.cdlib.org/WebAtRisk/tiki-download_file.php?fileId=181 . 2008
[3] The kulturarw3project-the royal Swedish Web archiw3e-an ex-ample of“complete”collection of Web pages. http://www.ifla.org/IV/ifla66/papers/154-157e.htm . 2008
[4] A first ex-perience in archiving the French Web. ABITEBOUL S,COB ENA G,MASANES J,et al. ftp://ftp.inria.fr/INRIA/Projects/verso/gemo/Ge-moReport- 229.pdf . 2008
[5] Austrian on-line archive. http://www.ifs.tuwien.ac.at/～aola/ . 2008
[6] Web at risk.Collection planning guidelines. http://wiki.cdlib.org/WebAtRisk/tiki-download_file.php?fileId=327 . 2008
[7] PANDOR A digital archiving system. http://pandora.nla.gov.au/pandas.html . 2008
[8] PANDOR A digital archiving system. http://pandora.nla.gov.au/pandas.html . 2008