首页 | 本学科首页   官方微博 | 高级检索  
     

噪声大数据的MapReduce高度随机模糊森林算法
引用本文:王梅,雒芬,张保华. 噪声大数据的MapReduce高度随机模糊森林算法[J]. 西南师范大学学报(自然科学版), 2019, 44(11): 110-117
作者姓名:王梅  雒芬  张保华
作者单位:1. 常州工程职业技术学院 实验实训教学部, 江苏 常州 213164;2. 河南理工大学 计算机科学与技术学院, 河南 焦作 454000;3. 常州工程职业技术学院 智能装备与信息学院, 江苏 常州 213164
基金项目:河南省科技攻关计划项目(162102310090).
摘    要:
为解决日趋增长的噪声大数据分类问题,提出了一种高度随机模糊森林算法.该算法在决策树学习中生成连续属性的模糊分区,并给出在MapReduce框架中所提算法的分布式实现,用于受属性噪声污染的大数据集中学习模糊决策树的集合,该分布式实现模型可以适应计算的有效分配策略,从而产生良好的可扩展性数据,这种分布式算法使得模糊随机森林能够处理大数据集的学习和分类.高度随机模糊森林算法能够实现噪声大数据的高精度分类,为以后的大数据分析打下良好的基础.实验结果表明,所提算法比现有算法准确率更高,在属性噪声情况下,该文分类准确率也高于随机森林算法,说明该文算法的可行性和有效性.

关 键 词:随机森林  模糊决策树  高度随机模糊森林  噪声大数据
收稿时间:2018-07-25

MapReduce Highly Random Fuzzy Forest Algorithm for Noisy Large Data
WANG Mei,LUO Fen,ZHANG Bao-hua. MapReduce Highly Random Fuzzy Forest Algorithm for Noisy Large Data[J]. Journal of southwest china normal university(natural science edition), 2019, 44(11): 110-117
Authors:WANG Mei  LUO Fen  ZHANG Bao-hua
Affiliation:1. Experimental Training and Teaching Department, Changzhou Vocational Institute of Engineering, Changzhou Jiangsu 213164, China;2. School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo Henan 454000, China;3. School of Intelligent Equipment and information, Changzhou Vocational Institute of Engineering, Changzhou Jiangsu 213164, China
Abstract:
In order to solve the problem of increasing noise big data classification, a highly random fuzzy forest algorithm has been proposed, which generates fuzzy partitions of continuous attributes in decision tree learning, and gives a distributed implementation of the proposed algorithm in MapReduce framework. Learning a set of fuzzy decision trees in a large data set contaminated by attribute noise, the distributed implementation model can adapt to the effective allocation strategy of the calculation, thereby generating good scalability data, and the distributed algorithm enables the fuzzy random forest to process learning and classification of big data sets. The highly random fuzzy forest algorithm can achieve high-precision classification of noisy big data, laying a good foundation for future big data analysis. The experimental results show that the proposed method has higher classification accuracy rate than the existing algorithm. In the case of attribute noise, the classification accuracy rate is higher than the random forest algorithm, which shows the feasibility and effectiveness of the proposed algorithm.
Keywords:random forest  fuzzy decision tree  highly random fuzzy forest  noise big data
本文献已被 CNKI 等数据库收录!
点击此处可从《西南师范大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《西南师范大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号