首页 | 本学科首页   官方微博 | 高级检索  
     

基于云计算的商业情报采集系统
引用本文:许云峰,张 妍,赵铁军. 基于云计算的商业情报采集系统[J]. 河北科技大学学报, 2012, 33(2): 161-165
作者姓名:许云峰  张 妍  赵铁军
作者单位:1. 河北科技大学信息科学与工程学院,河北石家庄,050018
2. 河北省通信建设有限公司,河北石家庄,050021
基金项目:河北省科技支撑计划资助项目
摘    要:商业情报采集系统不同于传统的搜索引擎系统,情报具有时效性、针对性等特点,传统搜索引擎中的数据分类和聚类技术不能完全满足商业情报采集过程中对时效性和针对性的特殊需求。提出一种商业情报采集解决方案,在云计算环境中采用贝叶斯分类算法和多种网页去重、提取等算法,实现对互联网数据的实时性抓取、分析、分类、聚类,形成对用户全方位立体化的情报本体,抓取的海量数据采用分布式文件系统存储,采集的情报用基于云的数据库CouchDB存储。

关 键 词:情报采集  搜索引擎  分类  聚类  云计算
收稿时间:2011-11-04

Cloud-based business intelligence gathering system
XU Yun-feng,ZHANG Yan and ZHAO Tie-jun. Cloud-based business intelligence gathering system[J]. Journal of Hebei University of Science and Technology, 2012, 33(2): 161-165
Authors:XU Yun-feng  ZHANG Yan  ZHAO Tie-jun
Affiliation:1.College of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang Hebei 050018,China;2.Hebei Communication Construction Company Limited,Shijiazhuang Hebei 050021,China)
Abstract:The business intelligence gathering system is different from the traditional search engine system.The data classification and clustering techniques of the traditional search engine can not fully meet the special needs of timeliness and pertinence in the business intelligence gathering process.This paper presents a solution to business intelligence gathering,by using Bayesian classification algorithm and deleting duplicated web pages algorithms in the cloud computing environment to achieve internet data’s real-time capturing,analysis,classification and clustering,and form the omnibearing and three-dimensional intelligence noumenon of users.The amount of data captured is stored in a distributed file system.The gathered information is stored in the cloud database CouchDB.
Keywords:intelligence gathering  search engine  classification  clustering  cloud computing
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《河北科技大学学报》浏览原始摘要信息
点击此处可从《河北科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号