首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于部分匹配方式的可扩展P2P搜索算法
引用本文:周晋,路海明,卢增祥,李衍达.基于部分匹配方式的可扩展P2P搜索算法[J].清华大学学报(自然科学版),2004,44(10):1389-1393.
作者姓名:周晋  路海明  卢增祥  李衍达
作者单位:清华大学,自动化系,北京,100084
基金项目:国家自然科学基金资助项目(60003004)
摘    要:理想的P2P(Peer-to-Peer)搜索算法应该同时具有信息检索水平的查询质量和有效的搜索性能。然而,现有的搜索算法都不能同时较好地满足这两点。基于这两个目标,该文提出一种基于层次聚类的分布层层次聚类(DHC)搜索算法。该算法中首先利用向量空间模型将文件内容表示成向量的形式,然后经过层次聚类操作得到一棵关于全网所有文件向量的层次树,层次树信息分布式地存储于整个网络中,以层次树为路由线索,路由深度不会超过树的高度。初步仿真试验表明,该算法的查全率在80%以上,并具有对数量级的搜索与更新代价。

关 键 词:P2P网络  搜索算法  层次聚类  文件内容  可扩展性
文章编号:1000-0054(2004)10-1389-05
修稿时间:2003年6月23日

Scalable routing algorithm for partial matches in P2P systems
ZHOU Jin,LU Haiming,LU Zengxiang,LI Yanda.Scalable routing algorithm for partial matches in P2P systems[J].Journal of Tsinghua University(Science and Technology),2004,44(10):1389-1393.
Authors:ZHOU Jin  LU Haiming  LU Zengxiang  LI Yanda
Abstract:The ideal routing algorithm in a P2P system should not only provide the IR algorithms' effectiveness, but also guarantee the router's scalability. This paper describes the distributed hierarchical clustering that meets both objectives. The algorithm is based on a state-of-the-art document ranking algorithm and the vector-space model, through which the files in a P2P network are denoted as vectors. The file content vectors are used to cluster a given content to produce a content clustering tree using a hierarchical clustering algorithm. The vectors are stored in a fully distributed manner. The path length of the routings in the cluster tree will be no longer than the tree depth. Simulations show that, with distributed hierarchical clustering, search recalls are more than eighty percent successful with lookup and join/departure costs both scaling logarithmically with the number of physical nodes.
Keywords:P2P  network  routing algorithm  hierarchical clustering  file content  scalability  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号