首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于粒度和信息熵的并行支持向量机算法
引用本文:毛伊敏,张刘鑫,卢欣荣.基于粒度和信息熵的并行支持向量机算法[J].科学技术与工程,2021,21(10):4124-4132.
作者姓名:毛伊敏  张刘鑫  卢欣荣
作者单位:江西理工大学信息工程学院, 赣州341000;江西理工大学应用科学学院, 赣州341000
基金项目:国家重点研发计划项目(No. 2018YFC1504705)、国家自然科学基金项目(No.41562019)
摘    要:针对大数据环境下并行支持向量机(support vector machine,SVM)算法存在噪声数据较敏感、训练样本数据冗余等问题,提出基于粒度和信息熵的GIESVM-MR(the SVM algorithm by using granularity and information entropy based on MapReduce)算法.该算法首先提出噪声清除策略(noise cleaning,NC)对每个特征属性的重要程度进行评价,获得样本与类别之间的相关度,以达到识别和删除噪声数据的目的;其次提出基于粒度的数据压缩策略(data compression based on granulation,GDC),通过筛选信息粒的方式保留类边界样本删除非支持向量,得到规模较小的数据集,从而解决了大数据环境下训练样本数据冗余问题;最后结合Bagging的思想和MapReduce计算模型并行化训练SVM,生成最终的分类模型.实验表明,GIESVM-MR算法的分类效果更佳,且在大规模的数据集下算法的执行效率更高.

关 键 词:大数据  噪声  粒度  信息熵  支持向量
收稿时间:2020/7/10 0:00:00
修稿时间:2021/4/2 0:00:00

The Parallel SVM Algorithm by Using Granularity and Information Entropy
Mao Yimin,Zhang Liuxin,Lu Xinrong.The Parallel SVM Algorithm by Using Granularity and Information Entropy[J].Science Technology and Engineering,2021,21(10):4124-4132.
Authors:Mao Yimin  Zhang Liuxin  Lu Xinrong
Institution:School of Information Engineering,Jiangxi University of Science Technology,School of Information Engineering,Jiangxi University of Science Technology,College of Applied Science,Jiangxi University of Science Technology
Abstract:Aiming at the problems of noise data sensitive and training sample redundancy of parallel SVM algorithm in big data environment, this paper have proposed a parallel SVM algorithm by using granularity and information entropy, named GIESVM-MR. Firstly, the algorithm proposed the NC (noise cleaning) method to evaluate the importance of each feature attribute and obtain the correlation between the sample and the category, which effectively identify and delete noise data. Secondly, a GDC (Data Compression based on Granulation) strategy is proposed, which screen the information granules to retain class boundary samples and delete non-support vectors. Then result in a smaller data set, and solve the problem of training sample data redundancy in a big data environment. Finally, the final classification model is generated by combining the idea of Bagging and MapReduce computing model. Experimental results show that the GIESVM-MR algorithm not only effectively improves the classification accuracy, but also reduces the time complexity of parallel SVM algorithm in big data environment.
Keywords:big data  noise  granularity  information entropy  support vector
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号