首页 | 本学科首页   官方微博 | 高级检索  
     检索      

分布式环境下的序列模式发现研究
引用本文:邹翔,张巍,肖明军,蔡庆生.分布式环境下的序列模式发现研究[J].复旦学报(自然科学版),2004,43(5):737-741.
作者姓名:邹翔  张巍  肖明军  蔡庆生
作者单位:中国科技大学,计算机系,合肥,230027;中国科技大学,计算机系,合肥,230027;中国科技大学,计算机系,合肥,230027;中国科技大学,计算机系,合肥,230027
基金项目:国家自然科学基金资助项目(70171052,60075015)
摘    要:提出一种称为DMSP(Distributed Mining of Sequential Patterns)的算法,以解决分布式环境下的序列模式挖掘问题.其主要思想是:利用前缀投影技术划分模式搜索空间,降低数据库的规模,生成局部序列模式;利用模式前缀指定选举站点降低通信开销;多线程异步运行,提高算法的并行性.实验结果显示:在具有海量数据的局域网环境中,DMSP算法的性能优于将数据集中后采用GSP算法65%以上.

关 键 词:数据挖掘  序列模式  分布式算法
文章编号:0427-7104(2004)05-0737-05

The Research Sequential Pattern Discovery in Distributed Environment
ZOU Xiang,ZHANG Wei,XIAO Ming-jun,CAI Qing-sheng.The Research Sequential Pattern Discovery in Distributed Environment[J].Journal of Fudan University(Natural Science),2004,43(5):737-741.
Authors:ZOU Xiang  ZHANG Wei  XIAO Ming-jun  CAI Qing-sheng
Abstract:An algorithm called DMSP (Distributed Mining of Sequential Patterns) is proposed in order to deal with mining sequential patterns in distributed environment. The main idea is that each site utilizes prefix-projected technique which divides the pattern search space and decreases the size of the database to generate local sequential patterns; each site utilizes polling site associated with prefix to decrease the cost of communication; multi-threads run asynchronously in each site to increase the concurrency of algorithm. The experiments show that algorithm DMSP is outperforming applying algorithm GSP after centralizing data by above 65 percent and scaleable over LAN with huge amount of data.
Keywords:data mining  sequential pattern  distributed algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号