首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于本体语义的定题爬虫
引用本文:郑健珍,林坤辉,周昌乐,康恺.基于本体语义的定题爬虫[J].山东大学学报(理学版),2006,41(3):90-94.
作者姓名:郑健珍  林坤辉  周昌乐  康恺
作者单位:1. 厦门大学,软件学院,福建,厦门,361005
2. 厦门大学,信息科学与技术学院,福建,厦门,361005
基金项目:面向21世纪教育振兴行动计划(985计划)
摘    要:定题爬虫能迅速获取网络上特定主题的大量信息,对专业搜索引擎及数据挖掘应用都具有重大价值.针对目前通用的基于关键词主题过滤策略的不足,在概念聚集思想启发下,提出了基于本体语义的主题过滤策略.同时根据网页具有不同位置不同信息重要性的特点,提出了改进的加权特征项权值计算公式,实现基于语义的网页实时过滤.为进一步提高爬虫的工作效率提出链接相关度预测算法.对比实验表明此策略具有可行性.

关 键 词:定题爬虫  主题过滤  本体语义  链接分析
文章编号:1671-9352(2006)03-0106-05
收稿时间:2006-03-29
修稿时间:2006年3月29日

Ontology based on focused crawler
ZHENG Jian-zhen,LIN Kun-hui,ZHOU Chang-le,KANG Kai.Ontology based on focused crawler[J].Journal of Shandong University,2006,41(3):90-94.
Authors:ZHENG Jian-zhen  LIN Kun-hui  ZHOU Chang-le  KANG Kai
Institution:1.Software School, Xiamen Univ., Xiamen 361005, Fujian, China; 2. Information Science and Technique Dept., Xiamen Univ., Xiamen 361005, Fujian, China
Abstract:Focused crawler can fetch large quantities of domain resources from the Web in a short time.It is very helpful in both foused search engines and data mining companies.In order to overcome the deficiency of topic filtering strategy based on widly used nowadays,the paper proposed a topic filtering stratege based on concept elicited by concept congregation idea.The paper also proposed an authority modified weight calculation formula based on different importance of Web page information.By doing this,real time Web page filtering based on concept can be achieved.In the hope of improving focused crawler's work efficiency more,the paper also proposed a link forecast algorithm.At last,the comparative experiment shows that the strategies proposed in this paper are pratical
Keywords:focused-crawler  topic-filtering  ontology-semantic-analyse  hyperlink-analyse
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《山东大学学报(理学版)》浏览原始摘要信息
点击此处可从《山东大学学报(理学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号