首页 | 本学科首页   官方微博 | 高级检索  
     

基于兴趣度的数据流频繁模式散列挖掘算法
引用本文:琚春华,殷贤君. 基于兴趣度的数据流频繁模式散列挖掘算法[J]. 系统工程理论与实践, 2012, 32(12): 2764-2773. DOI: 10.12011/1000-6788(2012)12-2764
作者姓名:琚春华  殷贤君
作者单位:1. 浙江工商大学 计算机与信息工程学院, 杭州 310015;2. 浙江工商大学 现代商贸中心, 杭州 310015
基金项目:国家自然科学基金(71071141);高等学校博士学科点专项科研基金(20103326110001);浙江省自然科学基金重点项目(Z1091224);浙江工商大学现代商贸中心(11JDSM02Z)
摘    要:频繁模式挖掘是很多数据流挖掘工作的基础.现有算法虽然能够有效的在数据流中挖掘近似的频繁模式, 但是由于数据流数据的不确定性、连续性以及海量性, 始终不能有效的将算法的时间效率和空间效率控制在一个可以接受的范围内. 本文通过使用散列表作为概要数据的存储结构, 并引入关联规则兴趣度的概念, 提出了数据流频繁模式挖掘算法MIFS-HT(mining interesting frequent itemsets with hash table), 不仅有效降低现有算法的时空复杂度, 同时提高了算法的应用价值. 最后, 实验结果表明: MIFS-HT是一种高效的数据流频繁模式挖掘算法, 其性能优于FP-Stream、Lossy Counting等算法, 并且挖掘结果更具有现实意义.

关 键 词:数据流  频繁模式  兴趣度  MIFS-HT  
收稿时间:2010-08-11

Mining approximate frequency itemsets over data streams based on hash and interesting degree
JU Chun-hua , YIN Xian-jun. Mining approximate frequency itemsets over data streams based on hash and interesting degree[J]. Systems Engineering —Theory & Practice, 2012, 32(12): 2764-2773. DOI: 10.12011/1000-6788(2012)12-2764
Authors:JU Chun-hua    YIN Xian-jun
Affiliation:1. College of Computer Science & Information Engineering, Zhejiang Gongshang University, Hangzhou 310015, China;2. Department of Modern Business Research Center, Zhejiang Gongshang University, Hangzhou 310015, China
Abstract:Frequent itemsets mining, which is the basic in the field of data stream mining, has been paid more and more attention by researchers. Due to the uncertainties, continuities and large amount of data streams, many mining algorithms are difficult to deal with these dynamic data streams. In this paper, hashed table and the interesting degree of association rules are introduced, where the former is used to represent the synoptic data structure and the latter is applied to incorporate attention of customers. After that, a new frequent itemsets mining algorithm named MIFS-HT(mining interesting frequent itemsets with hash table) is proposed. Comparing with lossy counting and a similar algorithm called mining frequent item sets over data streams by matrix (MISM for short), the result shows that MIFS-HT is more effective both in time and space efficiency.
Keywords:data stream  frequent itemset  degree of interesting  MIFS-HT
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《系统工程理论与实践》浏览原始摘要信息
点击此处可从《系统工程理论与实践》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号