首页 | 本学科首页   官方微博 | 高级检索  
     检索      

数据密集型计算环境下离群点挖掘算法设计与实现
引用本文:陈亚丽,张龙波,李彩虹,张树森,刘希昱.数据密集型计算环境下离群点挖掘算法设计与实现[J].山东理工大学学报,2013(5):32-35.
作者姓名:陈亚丽  张龙波  李彩虹  张树森  刘希昱
作者单位:山东理工大学计算机科学与技术学院,山东淄博255091
基金项目:山东省自然科学基金资助项目(ZR2011FL013);山东省高等学校科技计划项目(J13LN27)
摘    要:在数据密集型计算环境中,数据具有海量、高速变化、分布存储和异构等特征,对数据挖掘算法的设计与实现提出了新的挑战.基于MapReduce模型,提出了一种网格技术与基于LOF方法相结合的离群点挖掘算法MR_LOF.Map阶段采用网格进行数据约简,将代表点信息发送给主节点;Reduce阶段使用基于密度的离群点挖掘算法,借助网格期望值E筛选出稠密区域.该算法只需计算稀疏区域对象的LOF值,降低了算法的时间复杂度.实验结果表明,在数据密集型计算环境中,该方法能有效的对离群点进行挖掘.

关 键 词:数据挖掘  离群点  数据密集型  MapReduce  MR_LOF

Design and application of outlier mining algorithm in data-intensive computing environments
CHEN Ya-li,ZHANG Long-bo,LI Cai-hong,ZHANG Shu-sen,LIU Xi-yu.Design and application of outlier mining algorithm in data-intensive computing environments[J].Journal of Shandong University of Technology:Science and Technology,2013(5):32-35.
Authors:CHEN Ya-li  ZHANG Long-bo  LI Cai-hong  ZHANG Shu-sen  LIU Xi-yu
Institution:(School of Computer Science and Technology, Shandong University of Technology, Ziho 255091, China)
Abstract:The characteristics of data, such as huge amounts, high dimension and distributed storage etc, have brought new challenges for the design of outlier mining algorithm in data-inten- sive computing environments. In this paper, outliers mining algorithm MR_LOF based on density combined with grid was put forward on account of MapReduce model. During Map phase, grid was used to simplify data, then representative information was sent to primary node. In Reduce phase, outliers mining algorithm based on density was employed, dense area was selected by the grid~s E. This algorithm was used to only calculate LOF of data in sparse area to reduce time complexity. Experimental results show that this algorithm is effective for mining outliers in data- intensive computing environments.
Keywords:data mining  outlier  data-intensive  MapReduce  MR_LOF
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号