首页 | 本学科首页   官方微博 | 高级检索  
     检索      

GDLOF:基于网格和稠密单元的快速局部离群点探测算法
引用本文:张净,孙志挥.GDLOF:基于网格和稠密单元的快速局部离群点探测算法[J].东南大学学报(自然科学版),2005,35(6):863-866.
作者姓名:张净  孙志挥
作者单位:1. 东南大学计算机科学与工程系,南京,210096;江苏大学电气信息工程学院,镇江,212001
2. 东南大学计算机科学与工程系,南京,210096
基金项目:中国科学院资助项目,高等学校博士学科点专项科研项目
摘    要:为了适应高维大规模数据集的稀疏性,解决现有离群点探测算法在运用于高维大规模数据集时计算量以及时间效率均无法令人满意的现状,区别于以往文献中以点的数量作为判断稠密的阈值,在基于密度的局部异常检测算法LOF的基础上,以通过数据集中每一点周围的邻近点的状况作为判别依据,提出了稠密单元和稠密区域的概念以及基于网格和稠密单元的快速局部离群点探测算法.通过证明稠密单元和稠密区域中的点不可能成为离群点,使得算法减少了LOF值的计算量并显著提高效率.实验表明,该算法对于高维大规模数据集具有良好的适用性和有效性.

关 键 词:数据挖掘  离群点  稠密单元  稠密区域
文章编号:1001-0505(2005)06-0863-04
收稿时间:05 25 2005 12:00AM
修稿时间:2005-05-25

GDLOF: fast local outlier detection algorithm with grid-based and dense cell
Zhang Jing,Sun Zhihui.GDLOF: fast local outlier detection algorithm with grid-based and dense cell[J].Journal of Southeast University(Natural Science Edition),2005,35(6):863-866.
Authors:Zhang Jing  Sun Zhihui
Institution:1.Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China;2.College of Electronic and Information Engineering, Jiangsu University, Zhenjiang 212001, China
Abstract:Considering the sparse character of high-dimensional and large-scale datasets,the actuality that current algorithms for outlier detection applications are not so satisfactory in calculation cost and efficiency when dealing with high-dimensional and large-scale datasets is supposed to be altered.The fast local outlier detection algorithm with grid-based and dense notion was presented,which was based on the density-based local outlier detection algorithm(LOF)and judged outlier according as a wealth of information about the data in the vicinity of the point,and differed from the current algorithms which took the number of point as the parameter to judge denseness.By means of proving that those points in dense cell and dense region are not outlier,this algorithm can decrease computation amount and improve the efficiency of LOF algorithm while keeping the desirable detection accuracy.Results of experiments indicate that the new algorithm is effective and practicable for high-dimensional and large-scale datasets.
Keywords:data mining  outlier  dense cell  dense region  GDLOF(grid-based and dence cell based on local outlier factor)
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号