首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种新型垂直搜索引擎构建方法
引用本文:王美霞,李玉坤,肖迎元.一种新型垂直搜索引擎构建方法[J].天津理工大学学报,2012,28(4):84-88.
作者姓名:王美霞  李玉坤  肖迎元
作者单位:天津理工大学计算机与通信学院,天津,300384
基金项目:国家自然科学基金,天津市自然科学基金
摘    要:如何有效地构建面向领域的垂直搜索引擎,是信息检索领域众多研究者关注的问题.本文提出了一种通用的基于专业词汇表构建垂直搜索引擎的方法,通过分析网页特征,提出了基于链接结构和文本内容的启发式网页爬取策略.该策略结合网页的结构信息特征,在网页和主题相关度计算中考虑了特征词汇在网页中的权重,有效地提高了专业搜索引擎的查询效率.通过具体实现一个面向医疗领域的垂直搜索引擎,验证了本文所提出的方法的有效性.

关 键 词:垂直搜索引擎  构建方法  专业词汇表  网页结构

A new method for constructing vertical search engine
WANG Mei-xia , LI Yu-kun , XIAO Ying-yuan.A new method for constructing vertical search engine[J].Journal of Tianjin University of Technology,2012,28(4):84-88.
Authors:WANG Mei-xia  LI Yu-kun  XIAO Ying-yuan
Institution:( School of Computer and Communications Engineering, Tianjin University of Technology, Tianjin 300384, China )
Abstract:How to effectively construct a field-oriented vertical search engine is an important topic concerned by many researchers of information retrieval area. This paper proposes a method to construct a field-oriented vertical search engine based on professional vocabulary. Through analysis of the characteristics of web pages, this paper proposes the heuristic strategy for crawling web page, which integrates the link structure and text content of web pages, and takes the weight of words in web pages as a factor for working out the correlation between the web page and the field concerned, which can effectively improve the query performance of vertical search engine. Through developing a vertical search engine of medical field, the effective- ness of the method proposed in this paper is verified.
Keywords:vertical search engine  constructing method  professional vocabulary  Web structure
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号