首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于URL类型优先级的入口页面查询算法
引用本文:胡俊刚,董守斌,陈晓志,张元丰.基于URL类型优先级的入口页面查询算法[J].山东大学学报(理学版),2006,41(3):76-80.
作者姓名:胡俊刚  董守斌  陈晓志  张元丰
作者单位:华南理工大学,广东省计算机网络重点实验室,广东,广州,510641
摘    要:入口页面(主页)查询结果只有一个,并且用户的查询词常常是简短的页面名称,由于它要求更高的精准度,一般认为是较为困难的. 依据语言模型分析,挖掘出对中文入口页面(entry page)检索有意义的查询域作为基准检索的内容域,同时考虑到非内容网页优先级(URL type等)特征的重要性,建立综合内容域和非内容网页特征的检索模型. 通过URL类型优先级(URL type prior)的概率统计,发现入口页面和其相关的子页面之间存在比较大的联系. 据此提出基于相关子页面的入口页面提取算法PERS(page extracted from relevant sub page). 对比实验数据表明,PERS算法对检索的性能有较大提高.

关 键 词:入口页面检索  URL类型优先级  信息检索
文章编号:1671-9352(2006)03-0063-05
收稿时间:2006-03-29
修稿时间:2006年3月29日

Entry page search algorithm based on URL-type prior probabilities
HU Jun-gang,DONG Shou-bin,CHEN Xiao-zhi,ZHANG Yuan-feng.Entry page search algorithm based on URL-type prior probabilities[J].Journal of Shandong University,2006,41(3):76-80.
Authors:HU Jun-gang  DONG Shou-bin  CHEN Xiao-zhi  ZHANG Yuan-feng
Institution:Guangdong Key Laboratory of Computer Network South China University of Technology, Guangzhou 510641, Guangdong, China
Abstract:Entry page(home page) retrieval has the goal to retrieve just one right document,and the queries are usually short Web-page names.As a result,finding precisely an entry page with a high initial is quite difficult.According to unigram language model,the authors extract the field of Web page contents for baseline retrieval,which are useful for finding Chinese entry page,and then we build a new model combined content-field and non-contents features of Web pages(e.g.URL-type prior,proved to have the strongest predictive power).According to the prior probabilities of URL-type,the relationship between entry page and its sub-pages is discovered.Based on the relationship,we propose a new algorithm that entry page is extracted from relevant sub-pages(PERS).At last,we get the result from re-rank,and achieve a great advance on performance of entry page retrieval by using PERS.
Keywords:Entry page retrieval  URL-type priority  information retrieval
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《山东大学学报(理学版)》浏览原始摘要信息
点击此处可从《山东大学学报(理学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号