首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于网页分割的语义信息检索研究
引用本文:沈达峰.基于网页分割的语义信息检索研究[J].西昌学院学报(自然科学版),2009,23(4):57-61.
作者姓名:沈达峰
作者单位:淮阴工学院,现代教育中心,江苏,淮安,223003
摘    要:如何准确表达用户意图,判断网页与用户需求的相关性是信息检索技术研究的重要方向。本文提出了一种基于网页内容分割的语义信息检索算法。该算法根据网页半结构化的特点,按照HTML标记和网页的内容将网页进行区域分割。在建立HTML标记树的基础上,利用内容相似性和视觉相似性进行节点的整合。根据用户的查询,充分利用区域信息来对相关的检索结果进行排序。实验表明,本文提出的方法可以显著地提高搜索引擎的查询效果。

关 键 词:网页分割  语义  信息检索  HTML标记  相似性

Semantic Information Retrieval Study Based on Page Segmentation
SHEN Da-feng.Semantic Information Retrieval Study Based on Page Segmentation[J].Journal of Xichang College,2009,23(4):57-61.
Authors:SHEN Da-feng
Institution:SHEN Da-feng (Modern Education Technology Center, Huaiyin Institute of Technology, Huai'an, Jiangsu 223003)
Abstract:There is an important research direction of information retrieval technology for accurately judging the relations between the web pages and the user's requirement. In this paper, a semantic information retrieval algorithm based on web page segment is proposed. The key idea is to segment each web page into different topic areas or segments according to its HTML tags and contents since web pages are semi-structure. First the algorithm builds a HTML tag tree. Then it combines nodes in the tree by using both the content similarity and visual similarity. The retrieval and ranking algorithm makes use of this segmentation information to search and order the relevant pages. Experiment results show that this method is able to improve the search precision significantly.
Keywords:Page segment  Semantic  Information retrieval  HTML tag  Similarity
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《西昌学院学报(自然科学版)》浏览原始摘要信息
点击此处可从《西昌学院学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号