首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于协同训练半监督的分类算法
引用本文:王 宇,李延晖.一种基于协同训练半监督的分类算法[J].华中师范大学学报(自然科学版),2021,55(6):1020-1029.
作者姓名:王 宇  李延晖
作者单位:华中师范大学信息管理学院,武汉430079
摘    要:为提高少量样本情况下分类器的性能,提出一种基于多分类器协同的半监督样本选择方法,利用未标注样本实现样本增强,提高分类器泛化能力.依靠多分类器的互相监督和多分类器标签一致的原理,将已标记样本作为训练集,利用SVM和RF两个分类器协同训练,多分类器的类别标签和确定度值作为约束条件,从未标记样本集中筛选出最有代表性的样本构成增强样本集,以准确率为评价标准,验证本算法对分类器泛化性能的影响.本算法在手写数字数据集(Mnist字符库)和Landsat土壤数据集上测试,实验结果表明相比少量原始训练样本构建的分类器,增强样本构建分类器预测的全部类别准确率都得到提升.两个数据集的总体准确率分别提升5.97%和7.02%,Mnist数据集中数字5这类准确率提升最高(提升11.9%,从79.3%到91.2%),Landsat土壤数据集中土壤3这一类准确率提升最明显(提升15.8%,从73.5%到89.3%),结果证明了该算法显著提高了分类器的泛化性能.同时与经典的KNN、Co-training和Co-forest算法对比,所提出的算法能够最大限度地利用未标记样本信息,具有最好的精度表现,证明了该研究提出算法的优越性.

关 键 词:半监督学习  协同训练  支持向量机  随机森林  样本增强
收稿时间:2021-12-15

A semi-supervised image classification algorithm based on collaborative training
WANG Yu,LI Yanhui.A semi-supervised image classification algorithm based on collaborative training[J].Journal of Central China Normal University(Natural Sciences),2021,55(6):1020-1029.
Authors:WANG Yu  LI Yanhui
Institution:(School of Information Management,Central China Normal University, Wuhan 430079, China)
Abstract:In order to improve the performance of the classifier in the case of a small number of samples, a semi-supervised sample selection method based on the collaboration of multiple classifiers is proposed, which uses unlabeled samples to achieve sample enhancement and improve the generalization ability of the classifier. Relying on the mutual supervision of multiple classifiers and the principle of consistent labeling of multiple classifiers, the labeled samples are used as the training set, and the two classifiers SVM and RF are used for co-training. The category labels and certainty values of the multi-classifiers are used as constraints. The most representative samples are selected from the unlabeled sample set to form the enhanced sample set, and the accuracy is used as the evaluation standard to verify the influence of the algorithm on the generalization performance of the classifier. This algorithm is tested on the handwritten digit dataset (Mnist character library) and the Landsat soil dataset. The experimental results show that compared to the classifier constructed by a small number of original training samples, the accuracy of all categories predicted by the enhanced sample classifier is improved. The overall accuracy of the two data sets has increased by 5.97% and 7.02%, respectively. The accuracy of number 5 in the Mnist data set has the highest increase (an increase of 11.9%, from 79.3% to 91.2%), and the soil 3 in the Landsat soil data set is accurate. The rate increase is the most obvious (15.8% increase, from 73.5% to 89.3%), and the results prove that the algorithm has a certain degree of robustness. At the same time, compared with the classic KNN, Co-training and Co-forest algorithms, the proposed algorithm can maximize the use of unlabeled sample information and has the best accuracy performance, which proves the advantages of the proposed algorithm in this research.
Keywords:semi-supervised classification  collaborative training  SVM  RF  image classification  
本文献已被 万方数据 等数据库收录!
点击此处可从《华中师范大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《华中师范大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号