首页 | 本学科首页   官方微博 | 高级检索  
     检索      

检索调研环境自适应抓取算法的研究与实现
引用本文:郝孟涛.检索调研环境自适应抓取算法的研究与实现[J].科学技术与工程,2014,14(1).
作者姓名:郝孟涛
作者单位:计算机科学联合研究院
摘    要:检索调研环境是按照搜索引擎线上检索端模块搭建的供策略研发工程师进行策略调研的线下实验环境。验证策略生效最直接的方式是检索结果评估。策略调研效果评估的一个重要方式是搭建两套调研环境,原始环境和策略升级后环境,用一批query抓取这两个环境,然后对抓取结果送评,通过人工对比打分,由策略研发工程师对送评结果分析,决定是否上线策略。这应该是一个快速迭代的过程;但每个环节都有可能拉长迭代周期。调研环境的稳定性、资源不足、抓取效率等原因影响了策略调研阶段的时间,调研效率提升问题日渐突出。对调研过程中调研环境抓取改进进行研究,并实现了具有自适应性的在线学习抓取算法,极大地提升了抓取效率,减少抓取给调研效率上带来的负面影响。

关 键 词:搜索引擎  检索调研  自适应  抓取算法
收稿时间:8/4/2013 12:00:00 AM
修稿时间:2013/8/27 0:00:00

Study and Realization of Adaptive Crawl Algorithm for Retrieval Research Environment
Abstract:Retrieval research environment is an experimental environment which is built as search modules of the online search engine. It is used by strategic R & D engineers to conduct strategic research. To assess the results of retrieve is the best way to validate the strategy. Two sets of research environment, pristine environment and policy environment after the upgrade, will be built to crawl a number of queries. The results will be sent for scoring through artificial contrast. Then R & D engineers will determine whether on-line the strategy according to their analysis of the assessment. This should be a fast iteration process, but each link is likely to lengthen the iteration cycle. The stability, inadequate resources and the capture efficiency of research environment may lengthen the phase of the time for policy researching. It becomes a problem to prove the efficiency of research. This research is to improve the crawling of research environment. The realization with self-adaptive learning crawl algorithm will greatly enhance the efficiency crawling to improve the research.
Keywords:search engine    Retrieval research    self-adaptive learning    crawl algorithm
本文献已被 CNKI 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号