首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种垂直结构的高效用项集挖掘算法
引用本文:黄坤,吴玉佳.一种垂直结构的高效用项集挖掘算法[J].大连理工大学学报,2017,57(5):524-530.
作者姓名:黄坤  吴玉佳
基金项目:国家自然科学基金资助项目(61303046).
摘    要:挖掘高效用项集已成为关联分析中的热点问题之一.多数高效用项集挖掘算法需要产生大量的候选项集,影响了算法性能.HUI-Miner是一个不需要产生候选项集就能发现事务数据库中所有高效用项集的算法.但其需要产生大量效用列表,不仅消耗了过多的存储空间,而且影响了算法的运行性能.针对此问题,提出一个新的数据结构,称为项集列表,用于存储事务和项的效用信息.提出3种剪枝策略,减少项集列表的数量,通过扫描一次事务数据库完成所有项集列表的构建.提出算法MHUI,直接从项集列表中挖掘所有的高效用项集而不产生任何候选项集.在3个不同的稀疏数据集上和最新的算法进行对比实验证明,MHUI算法的运行时间和内存消耗优于其他算法.

关 键 词:数据挖掘  关联分析  频繁项集  高效用项集

An algorithm of mining high utility itemsets with vertical structures
HUANG Kun,WU Yujia.An algorithm of mining high utility itemsets with vertical structures[J].Journal of Dalian University of Technology,2017,57(5):524-530.
Authors:HUANG Kun  WU Yujia
Abstract:Mining high utility itemsets (HUIs) is one of popular tasks in field of association analysis. Most of HUIs mining algorithms need to generate a lot of candidate itemsets (CIs) which will affect the performance of algorithm. HUI-Miner can mine all the HUIs from a transaction database without generating CIs. However, this algorithm generates a large number of utility lists (ULs) and so many ULs not only consume too much storage space but also affect the operation performance. To solve this problem, itemsets lists (ILs), new data structures are proposed to maintain information of transaction and item utility. Three pruning strategies are proposed to reduce the number of ILs and can build the ILs just scanning the transaction database only once. A new algorithm namely MHUI is proposed which mines all the HUIs directly from the ILs without generating any CIs. The experimental results show that the proposed method outperforms the state-of-the-art algorithms in terms of runtime and memory consumption on three different sparse datasets.
Keywords:data mining  association analysis  frequent itemsets  high utility itemsets
本文献已被 CNKI 等数据库收录!
点击此处可从《大连理工大学学报》浏览原始摘要信息
点击此处可从《大连理工大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号