基于单元的快速的大数据集离群数据挖掘算法 Fast outlier data mining algorithm based on cell in large datasets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于单元的快速的大数据集离群数据挖掘算法

引用本文：	王柯柯,崔贯勋,倪伟,苟光磊.基于单元的快速的大数据集离群数据挖掘算法[J].重庆邮电大学学报(自然科学版),2010,22(5):673-677.

作者姓名：	王柯柯崔贯勋倪伟苟光磊

作者单位：	重庆理工大学,计算机科学与工程学院,重庆,400054;重庆理工大学,计算机科学与工程学院,重庆,400054;重庆理工大学,计算机科学与工程学院,重庆,400054;重庆理工大学,计算机科学与工程学院,重庆,400054

基金项目：	重庆市科技攻关计划项目(CSTC，2008AC2126；CSTC，2009AC2034);重庆市自然科学基金(CSTC，2008BB2065)

摘要：	提出基于单元的快速的大数据集离群数据挖掘算法，用聚簇技术对数据进行预处理，然后将数据放入合适的空间单元并对非空单元使用维单元树（cell dimension tree，CD-tree）进行索引，数据集中大部分位于高密度区且与离群数据无关的数据将会被过滤掉，从而避免了大量不必要的计算。实验表明，该算法能快速准确地从大数据集中挖掘出离群数据，并提高离群数据的检测速度。
关键词：	大数据集离群数据单元分块快速
收稿时间：	2010/5/17 0:00:00
Fast outlier data mining algorithm based on cell in large datasets

WANG Ke-ke,CUI Guan-xun,NI Wei,GOU Guang-le.Fast outlier data mining algorithm based on cell in large datasets[J].Journal of Chongqing University of Posts and Telecommunications,2010,22(5):673-677.

Authors:	WANG Ke-ke CUI Guan-xun NI Wei GOU Guang-le

Abstract:	The paper proposed a fast cell-based algorithm for outlier detection in large datasets(short for FOMABCLD). The algorithm applied cluster technique to preprocesse data, and placed data into the appropriate cells based on their values and indexed the non-empty cells with Cell Dimension Tree. A majority part of data located in high density cells and had no nearness relationship with outliers is filtered, which avoided large useless computations. The experiment show that FOMABCLD can mine outlier data from large datasets fast and accurately, and the speed of detecting outliers is increased.

Keywords:
本文献已被万方数据等数据库收录！
	点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏