首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于改进主动学习和自训练的联合算法
引用本文:吕佳,傅屈寒.基于改进主动学习和自训练的联合算法[J].北京师范大学学报(自然科学版),2022,58(1):25-32.
作者姓名:吕佳  傅屈寒
作者单位:重庆师范大学计算机与信息科学学院,重庆市数字农业服务工程技术研究中心,401331,重庆
基金项目:国家自然科学基金资助项目(11971084);
摘    要:针对主动学习面向大型数据集人工标记成本过高和半监督自训练算法中存在误标记点影响的问题,提出了一种主动学习与半监督自训练交替迭代训练的联合算法.算法在训练过程中奇数轮次采用主动学习算法,偶数轮次采用自训练算法,通过2种算法的交替迭代训练以弥补彼此不足.自训练算法对无标记样本的预测减轻了主动学习标记样本的负担,同时主动学习标记易变成噪声的样本,减轻了自训练算法训练过程中对样本的标记错误.提出了一种基于密度峰值聚类和隶属度的改进主动学习算法:将初始无标记样本聚类成簇,根据隶属度差值在每个簇内选取部分样本做人工标记,获得可表达样本的整体结构的均衡样本.仿真试验表明:提出的联合算法在性能上要优于2种单一算法.对比常见的主动学习算法,改进后的主动学习算法分类性能得到显著提升,将其应用于联合算法中的效果更具优势. 

关 键 词:主动学习    自训练算法    密度峰值聚类    联合算法    隶属度
收稿时间:2021-08-10

A joint algorithm by combined improved active learning and self-training
Lü Jia,FU Quhan.A joint algorithm by combined improved active learning and self-training[J].Journal of Beijing Normal University(Natural Science),2022,58(1):25-32.
Authors:Lü Jia  FU Quhan
Institution:College of Computer and Information Sciences, Chongqing Normal University, Chongqing Digital Agriculture Service Engineering Technology Research Center, 401331,Chongqing, China
Abstract:Aiming at the problem of high cost of manual labeling in large data sets and influence of mislabeled points in semi-supervised self-training algorithm, a joint algorithm of alternatively iterative training for active learning and semi-supervised self-training was proposed.In the training process, active learning algorithm was used for odd turns, self-training algorithm was used for even turns, alternatively iterative training of the two algorithms was used to make up for each other’s deficiency.The prediction of unlabeled samples by self-training algorithm alleviated the burden of active learning labeling samples.Samples labeled by active learning tended to become noisy, alleviating labeling errors in samples in the training process of self-training algorithm.An improved active learning algorithm based on density peaks clustering and membership degree was proposed also: the initial unlabeled samples were clustered, with some samples in each cluster selected for manual labeling according to difference of membership degree, to obtain balanced samples to embody the overall structure of samples.Performance of the proposed joint algorithm was found to be better than the two single algorithms.Compared with common active learning algorithms, classification performance of the improved active learning algorithm was significantly improved, and application in joint algorithm had more advantages. 
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《北京师范大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京师范大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号