首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于灰色关联分析的类中心缺失值填补方法
引用本文:刘莎,杨有龙.基于灰色关联分析的类中心缺失值填补方法[J].四川大学学报(自然科学版),2020,57(5):871-878.
作者姓名:刘莎  杨有龙
作者单位:西安电子科技大学数学与统计学院,西安 710126;西安电子科技大学数学与统计学院,西安 710126
基金项目:国家自然科学基金(61573266)
摘    要:真实数据集中含有缺失值,许多数据分析技术不能直接应用到不完整数据上,且缺失值的存在会明显地降低算法的有效性,缺失数据处理是一个不可缺少的数据预处理过程,因此提出了一个基于统计度量的缺失值填补算法,名为灰色类中心缺失值填补(GCCMVI)方法,利用数据点的类中心和标准差来填补缺失值,此外,通过比较阈值和实例与类中心间相关性的大小关系,决定是否加上(减去)标准差,灰色关联分析用来计算相关性,在缺失值被填补后,得到的完整的数据集用来训练支持向量机(SVM)分类器.在三种类型不同的数据集上进行比较,以分类精度,填补效果,填补时间作为评估准则来衡量算法的有效性.实验结果表明,所提出的算法显著地提高了分类精度和填补效果.

关 键 词:数据分析  不完整数据  缺失值填补  类中心  灰色关联分析
收稿时间:2019/6/4 0:00:00
修稿时间:2019/8/30 0:00:00

Imputing missing value by class center based on grey relational analysis
lIU Sha and YANG You-Long.Imputing missing value by class center based on grey relational analysis[J].Journal of Sichuan University (Natural Science Edition),2020,57(5):871-878.
Authors:lIU Sha and YANG You-Long
Abstract:Many data mining techniques cannot be applied directly to incomplete dataset which contains missing values. Furthermore, missing values will significantly reduce the effectiveness of the algorithm. So missing data management is an indispensable data preprocessing process. The proposed imputation method is based on statistical measurements named as grey class center missing value imputation (GCCMVI) approach. The missing values are imputed based on class center and standard deviation. Besides, the standard deviation is added (subtracted) or not determined by comparing the threshold and the relevance between class center and instance. Grey relational analysis is used to compute relevance. After the missing values are filled, the complete dataset is used to train the support vector machine (SVM) classifier. The comparative experiments are carried out on three datasets in different types. The classification accuracy, imputation performance and imputation time are used as criteria to evaluate the effectiveness of the proposed algorithm, experimental results show that it significantly improves the classification accuracy and imputation performance.
Keywords:Data mining  Incomplete data  Missing value imputation  Class center  Grey relational analysis
本文献已被 万方数据 等数据库收录!
点击此处可从《四川大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《四川大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号