首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于nested-loop的大数据集快速离群点检测算法
引用本文:倪巍伟,陈耿,陆介平,孙志挥.基于nested-loop的大数据集快速离群点检测算法[J].东南大学学报(自然科学版),2006,36(3):463-466.
作者姓名:倪巍伟  陈耿  陆介平  孙志挥
作者单位:1. 东南大学计算机科学与工程学院,南京,210096
2. 南京审计学院审计信息工程重点实验室,南京,210029
基金项目:中国科学院资助项目,广东省博士启动基金,国家审计局审计科研所资助项目
摘    要:针对已有的多数离群点检测算法存在扩展性差,不能有效应用于大数据集的问题,在已有的基于距离的离群点检测算法的基础上,设计模信息表存储结构,利用向量内积不等式关系以及合理的存储分配和调度策略,提出一种高效离群点检测算法DBoda.该算法通过在预处理中存储每个点的模信息,减少点间距离的计算量,并对嵌套循环方法进行优化,进一步减少I/O的开销.理论分析和试验结果表明,所提算法具有时间消耗小和适用于处理大数据集的特点,可以有效地解决离群点检测中的算法时间复杂性和算法扩展性问题.

关 键 词:大数据集  模信息表  向量内积不等式  离群点检测
文章编号:1001-0505(2006)03-0463-04
收稿时间:10 10 2005 12:00AM
修稿时间:2005-10-10

Efficient nested-loop based outlier detection algorithm for large data set
Ni Weiwei,Chen Geng,Lu Jieping,Sun Zhihui.Efficient nested-loop based outlier detection algorithm for large data set[J].Journal of Southeast University(Natural Science Edition),2006,36(3):463-466.
Authors:Ni Weiwei  Chen Geng  Lu Jieping  Sun Zhihui
Institution:1. School of Computer Science and Engineering, Southeast University, Nanjing 210096, China;2. Key Laboratory of Audit Information Engineering, Nanjing Audit University, Nanjing 210029, China
Abstract:Most of the existed outlier detection algorithms have the limitation in algorithms' expansibility,and cannot be used efficiently for the large data set.To solve this problem,mode storage structure and vectors' inner product inequation are designed,the suitable storage allocating method and the I/O strategy are adopted.Furthermore,based on the existed distance-based outlier detection algorithm,an efficient nested-based outlier detection algorithm DBoda is proposed,which is suitable for the large data set.Two strategies are adopted in the algorithm.Firstly,during the pretreatment process,each data point's mode information is stored to reduce the computation work.Secondly,optimization is adopted in the nested loop step to reduce I/O.Theoretical analysis and experiment results testify that DBoda is efficient and suitable to deal with large data set.It can solve the time complexity and expansibility problem of outlier detection algorithms.
Keywords:large data set  mode table  vectors' inner product inequation  outlier detection  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号