基于属性相关度的Web数据库大小估算方法

被引:28
作者
凌妍妍
孟小峰
刘伟
机构
[1] 中国人民大学信息学院
[2] 中国人民大学信息学院 北京
基金
北京市自然科学基金;
关键词
词频; Web数据库大小估计; 属性相关度;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
提出了一种基于词频统计的方法以估算Web数据库的规模.通过分析Web数据库查询接口中属性之间的相关度来获取某个属性上的一组随机样本;并对该属性分别提交由前k位高频词形成的试探查询以估算Web数据库中记录的总数.通过在几个真实的Web数据库上进行实验验证,说明该方法可以准确地估算出Web数据库的大小.
引用
收藏
页码:224 / 236
页数:13
相关论文
共 15 条
[1]  
Accessing the Web:From search to integration. Chang KCC,Cho J. Proc.of2006ACM SIGMOD Int’l Conf.on Management of Data(SIGMOD2006) . 2006
[2]  
The deep Web:Surfacing hidden value. BrightPlanet.com. http://brightplanet.com . 2000
[3]  
Discovering the representative of a search engine. Liu KL,Yu CT,Meng W. Proc.of the11th Int’l Conf.on Information and Knowledge Management(CIKM2002) . 2002
[4]  
Relevant document distribution estimation method for resource selection. Si L,Callan JP. Proc.of the26th ACM Int’l Conf.on Research and Development in Information Retrieval(SIGIR2003) . 2003
[5]  
Estimating size of search engines in an uncooperative environment. Karnatapu S,Ramachandran K,Wu Z. Proc.of the2nd Int’l Workshop on Web-Based Support Systems2004(WSS2004) . 2004
[6]  
Capturing collection size for distributed non-cooperative retrieval. Shokouhi M,Zobel J,Scholer F,Tahaghoghi SMM. Proc.of the29th ACM Int’l Conf.on Research and Development in Information Retrieval(SIGIR2006) . 2006
[7]  
Capturing collection size for distributed non-cooperative retrieval. Shokouhi M,Zobel J,Scholer F,Tahaghoghi SMM. Proc.of the29th ACM Int’l Conf.on Research and Development in Information Retrieval(SIGIR2006) . 2006
[8]  
Automated discovery of search interfaces on the Web. Cope J,Craswell N,Hawking D. Proc.of the14th Australasian Database Conf.(ADC2003) . 2003
[9]  
Query routing:Finding ways in the maze of the deep Web. Kabra G,Li C,Chang KCC. Proc.of the Int’l Workshop on Challenges in Web Information Retrieval and Integration(WIRI2005) . 2005
[10]  
WISE-Integrator:An automatic integrator of Web search interfaces for e-commerce. He H,Meng W,Yu CT,Wu Z. Proc.of the29th Int’l Conf.on Very Large Data Bases(VLDB2003) . 2003