首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于多agent强化学习的语义Web爬虫设计
引用本文:谢枫平.基于多agent强化学习的语义Web爬虫设计[J].漳州师院学报,2010(4):63-68.
作者姓名:谢枫平
作者单位:闽西职业技术学院计算机系,福建龙岩364021
摘    要:Web的海量信息导致了搜索引擎的出现,同时,Web数据的迅速膨胀以及频繁的更新对搜索引擎提出了更高的要求,而并行搜索引擎可以提高抓取速度,并改善更新效率.语义Web是对未来Web的一个设想,语义Web的数据同传统Web一样面临着数据的膨胀更新问题.于是研究语义Web并行搜索引擎成了一个重要的研究方向.介绍了如何设计一个基本的面向语义Web的并行爬虫系统.该系统由一个中央控制器和若干个子爬虫组成.中央控制器负责为爬虫分配抓取任务,并汇总抓取的数据;子爬虫负责抓取并抽取URLs的工作.而对于每个子爬虫除了处理RDF文档之外,还试图从传统HTML网页中通过强化学习的方法发现更多RDF文档链接.

关 键 词:语义Web  并行爬虫  强化学习

The Design of a Multi-agent Reinforcement Learning Based on Semantic Web Crawler
XIE Feng-ping.The Design of a Multi-agent Reinforcement Learning Based on Semantic Web Crawler[J].Journal of ZhangZhou Teachers College(Philosophy & Social Sciences),2010(4):63-68.
Authors:XIE Feng-ping
Institution:XIE Feng-ping(Computer Department,Minxi Vocational and Technical College,Longyan Fujian 364021,China)
Abstract:With the explosive increase and frequently update of web information,web search engine faces a big challenge.Semantic web is next generation web,and it also facing the problem of information expanding and updating quicklly.Parallel search engine can speed up web crawlling and improve updating efficiency.This paper describes a semantic web based parallel crawler system.The crawler system has a central controller and several crawlers.The controller dispatches tasks to each crawler and collect data from them.Each crawler has the ability of processing RDF document and learning from traditional HTML pages to find more RDF links.The learning method crawler used is reinforcement learning.
Keywords:Semantic Web  Parallel Web Crawler  Reinforcement Learning
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号