首页 | 本学科首页   官方微博 | 高级检索  
     

基于代表的交叉验证分类
引用本文:王轩,顾峰,闵帆,孙远秋. 基于代表的交叉验证分类[J]. 重庆邮电大学学报(自然科学版), 2021, 33(5): 826-833. DOI: 10.3979/j.issn.1673-825X.202105160162
作者姓名:王轩  顾峰  闵帆  孙远秋
作者单位:西南石油大学 网络与信息化中心,成都610500;西南石油大学 计算机科学学院,成都610500;西南石油大学 人工智能研究院,成都610500;西南石油大学 计算机科学学院,成都610500
基金项目:国家自然科学基金(62006200);四川省自然科学基金(2019YJ0314);四川省青年科学技术创新团队(2019JDTD0017);西南石油大学课外开放实验立项(2020KSP61001)
摘    要:基于代表的邻域覆盖粗糙集分类算法,在某些数据集上表现良好,数据的类别不平衡问题严重影响算法的分类精度.为尽量消除类别不平衡问题的影响,在k折交叉验证方法的基础上,针对基于代表的邻域覆盖粗糙集分类算法,提出了3种集成策略.策略1依靠k折交叉验证,获得对应的k个基分类器,所有的基分类器组成委员会对未分类样本分类;在策略1的基础上,策略2选择分类精度相对较高的基分类器组成委员会,对未分类的样本进行分类;策略3在前2种策略的基础上,利用主动学习的思想,对训练集进行扩充,得到新的分类器再对未分类样本分类.实验所用数据集为UCI标准数据集,且对k的取值做了对比实验.结果显示,3种策略均有不同程度的提升,且k取5时总能取得较好的提升效果.对于不同数据集,应选择相适应的改进策略.

关 键 词:代表选举  粗糙集  分类  集成学习  主动学习
收稿时间:2021-05-16
修稿时间:2021-06-01

Representative-based cross validation classification
WANG Xuan,GU Feng,MIN Fan,SUN Yuanqiu. Representative-based cross validation classification[J]. Journal of Chongqing University of Posts and Telecommunications, 2021, 33(5): 826-833. DOI: 10.3979/j.issn.1673-825X.202105160162
Authors:WANG Xuan  GU Feng  MIN Fan  SUN Yuanqiu
Affiliation:Network & Information Center, Southwest Petroleum University, Chengdu 610500, P. R. China;School of Computer Sciences, Southwest Petroleum University, Chengdu 610500, P. R. China;Institute for Artificial Intelligence, Southwest Petroleum University, Chengdu 610500, P. R. China
Abstract:Representative-based classification through covering-based neighborhood rough sets, it performs well on some data sets, and the imbalance of data categories seriously affects the classification accuracy of the algorithm. In order to eliminate the impact of category imbalance as much as possible, based on the k-fold cross-validation method, this paper proposes three ensemble strategies for the representative-based classification through covering-based neighborhood rough sets algorithm. The first strategy relies on k-fold cross-validation to obtain corresponding k base classifiers, and all base classifiers form a committee to classify unclassified samples. On the basis of the first strategy, the second strategy selects a base classifier with relatively high classification accuracy to form a committee, and then classifies the unclassified samples. Based on the first two strategies, the third strategy uses the idea of active learning to expand the training set to obtain a new classifier and then classify the unclassified samples. The data set used in the experiment is the UCI standard data set, and a comparative experiment has been done on the value of k. The results show that the three strategies have different degrees of improvement, and when k is set to 5, a better improvement effect can always be achieved. For different data sets, appropriate improvement strategies should be selected.
Keywords:representative election  rough set  classification  ensemble learning  active learning
本文献已被 万方数据 等数据库收录!
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号