首页 | 本学科首页   官方微博 | 高级检索  
     检索      

政府网站移动搜索的日志挖掘和个性化改进
引用本文:叶小榕,邵晴.政府网站移动搜索的日志挖掘和个性化改进[J].科技导报(北京),2014,32(36):110-116.
作者姓名:叶小榕  邵晴
作者单位:1. 中国科学技术信息研究所, 北京100038;
2. 北龙中网(北京)科技有限责任公司, 北京100190
摘    要: 为充分利用移动搜索和政府网站的特点, 发挥Hadoop 处理大数据的优势, 设计开发了日志挖掘和个性化定制系统。利用Flume 和HDFS 实现了海量日志的汇总和存储, 为日志挖掘提供了数据源和调用接口;采用MapReduce 实现了对日志的高效分析, 利用搜索结果网页的标签和导航, 建立了网页向量空间模型和用户兴趣模型;根据用户兴趣模型, 使用聚类分析中的K-means算法将有相似兴趣的用户组成兴趣组;通过计算搜索结果网页到用户所在兴趣组的距离, 判断用户对该网页是否感兴趣, 据此调整搜索结果的排序, 实现个性化搜索和推送功能。

关 键 词:个性化搜索  个性化推荐  聚类分析  MapReduce  
收稿时间:2014-10-22

Log Mining and Personalization Improvement for Mobile Search System of Government Websites
YE Xiaorong,SHAO Qing.Log Mining and Personalization Improvement for Mobile Search System of Government Websites[J].Science & Technology Review,2014,32(36):110-116.
Authors:YE Xiaorong  SHAO Qing
Institution:1. Institute of Scientific and Technical Information of China, Beijing 100038, China;
2. KNET Co., Ltd., Beijing 100190, China
Abstract:By taking full advantage of the characteristics of mobile search and government website, a log mining and customization system, which makes use of the advantages of Hadoop in large data processing, is designed and developed. First, it uses Flume and HDFS to realize the collection and storage of massive log and to provide source data and program interface of log mining. Second, the system uses MapReduce to efficiently analyze the log by taking advantage of labels and navigation bar of search result pages. Thus, the vector space model of search result pages and user interest model are established. Third, based on user interest model and combined with MapReduce again, the K-means algorithm which is for cluster analysis is used. Then, users are divided into different interest groups depending on their interests. Finally, by calculating the distance between search result page and the user's interest group, whether the user is interested in this page is determined, then the system adjusts the order of search results and pushes a new page to this user accordingly. Therefore, the personalized search and push function are implemented.
Keywords:personalized search  personalized recommendations  cluster analysis  MapReduce  
本文献已被 CNKI 等数据库收录!
点击此处可从《科技导报(北京)》浏览原始摘要信息
点击此处可从《科技导报(北京)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号