首页 | 本学科首页   官方微博 | 高级检索  
     

数据挖掘中并行离散化数据准备优化
引用本文:刘云,袁浩恒. 数据挖掘中并行离散化数据准备优化[J]. 四川大学学报(自然科学版), 2018, 55(5): 993-999
作者姓名:刘云  袁浩恒
作者单位:昆明理工大学信息工程与自动化学院, 昆明 650500,昆明理工大学信息工程与自动化学院, 昆明 650500
摘    要:在海量数据挖掘中,针对元数据的离散化数据准备处理能有效提高数据挖掘效率.本文提出了一种并行比较并获得最优离散化的数据准备算法(AOA),针对不同数据集,先进行数据集的特性检测以获得数据集分布特性,按照分布特性进行数据集的异常值检测和剔除,并行完成与分布特性适配的离散化方法处理,通过比较不同离散化方法的熵、方差指数、稳定性参数的最小欧氏距离,根据三个参数自动化比选,获得最优离散化的预处理成果.仿真表明,对不同样本数据库进行关联规则挖掘结果中,比较四种固定的离散化数据预处理方法,在使用AOA数据准备算法并行比选出最优的离散化来数据预处理后,在不同最小支持度阈值情况下,挖掘得到关联规则数都更少,因此效率得到提高.

关 键 词:数据挖掘;数据准备;并行调用;分布检测;数据离散化
收稿时间:2017-10-24
修稿时间:2018-01-05

Parallel Discretization of Data Preparation Optimization in Data Mining
LIU Yun and YUAN Hao-Heng. Parallel Discretization of Data Preparation Optimization in Data Mining[J]. Journal of Sichuan University (Natural Science Edition), 2018, 55(5): 993-999
Authors:LIU Yun and YUAN Hao-Heng
Affiliation:School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China,School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
Abstract:In data mining, the discretization of data can improve the efficiency of data mining effectively. In this paper, we propose a data preprocessing algorithm (AOA) to obtain the optimal discretization using parallel comparison. For different data sets, we first perform the feature detection of the data set to obtain the distribution characteristics of the data set. Then the outliers of the data set are detected according to the distribution characteristics. IN addition, the discretization results are obtained by comparing the minimum Euclidean distance of the entropy, the variance index and the stability parameter of the different discretization methods. In simulation experiment, we compare the AOA with four typical data discretization methods in different databases by running the association rule mining algorithm on the discretization data obtained using AOA and other four methods, respectively. The results show that, under different minimum support thresholds, the number of association rules extracted from the discretization data obtained using AOA is the least, indicating higher efficiency of AOA.
Keywords:data mining   data preparation   parallel invocation   distributed detection   data discretization
本文献已被 CNKI 等数据库收录!
点击此处可从《四川大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《四川大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号