首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于机器学习的数据库小数据集并行集成方法
引用本文:王俊,程显生,王寿东.基于机器学习的数据库小数据集并行集成方法[J].科学技术与工程,2019,19(16):239-244.
作者姓名:王俊  程显生  王寿东
作者单位:内蒙古农业大学,计算机技术与信息管理系,内蒙古农业大学,计算机技术与信息管理系,内蒙古农业大学,食品工程技术系
摘    要:为了解决传统方法不能按照训练样本量设计最优网络模型,集成效率低的弊端,通过机器学习方法研究数据库小数据集并行集成方法。机器学习选用朴素贝叶斯算法,依据条件独立性假设,通过计算目标先验概率,采用贝叶斯定理求出其后验概率,对后验概率进行比较,完成决策分类,对基分类器进行训练,把不同朴素贝叶斯基分类器当成集成分类器,在原始数据库上对基分类器进行训练,依据分类结果对数据库中小数据集样本分布进行调整,将其当成新数据集对基分类器进行训练,按照基分类器的表现,通过加权将其组合在一起,产生强分类器,实现对数据库小数据集的集成处理。通过MapReduce并行处理完成并行数据集成,输出并行集成结果。通过仿真实验与实例分析验证所提方法的有效性,结果表明:所提方法在训练样本规模相同的情况下有最高的分类精度和最小的波动,在不同集成规模下的分类精度一直最高,波动最小;所提方法可达到数据的最优集成,数据失效比降低,合成比提高。可见所提方法集成精度高,计算稳定性强,集成效果好,效率优。

关 键 词:机器学习  数据库  小数据集  并行集成
收稿时间:2018/12/21 0:00:00
修稿时间:2019/2/16 0:00:00

Research on Parallel Integration of Small Data Sets in Database Based on Machine Learning
Wang Jun,and Shou Dong Wang.Research on Parallel Integration of Small Data Sets in Database Based on Machine Learning[J].Science Technology and Engineering,2019,19(16):239-244.
Authors:Wang Jun  and Shou Dong Wang
Institution:Inner Mongolia Agricultural University, Department of computer technology and information management,,Inner Mongolia Agricultural University, Department of food engineering technology
Abstract:In order to overcome the drawbacks of traditional methods that can not design the optimal network model according to the training sample size and low integration efficiency, the parallel integration method of small data sets in database is studied by machine learning method. Machine learning chooses naive Bayesian algorithm, according to the assumption of conditional independence, calculates the prior probability of the target, uses Bayesian theorem to find the posterior probability, compares the posterior probability, completes the decision classification, trains the base classifier, and regards different naive Bayesian classifiers as integrated classifiers. The basic classifier is trained in the initial database, and the sample distribution of small and medium data sets in the database is adjusted according to the classification results. The basic classifier is trained as a new data set. According to the performance of the basic classifier, the strong classifier is generated by combining them with weights to realize the integration of small data sets in the database. Handle. Parallel data integration is accomplished by MapReduce parallel processing, and the results of parallel integration are output. The validity of the proposed method is verified by simulation experiments and case analysis. The results show that the proposed method has the highest classification accuracy and the smallest fluctuation under the same training sample size, and the highest classification accuracy and the smallest fluctuation under different integration scales. The proposed method can achieve the optimal integration of data and data. The failure ratio decreases and the synthesis ratio increases. It can be seen that the proposed method has high integration accuracy, strong computational stability, good integration effect and high efficiency.
Keywords:machine learning  database  small data set  parallel integration
本文献已被 CNKI 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号