首页 | 本学科首页   官方微博 | 高级检索  
     

嵌入式多标签分类算法的优化研究
引用本文:肖雪,刘云. 嵌入式多标签分类算法的优化研究[J]. 北京化工大学学报(自然科学版), 2019, 46(5): 94-100. DOI: 10.13543/j.bhxbzr.2019.05.014
作者姓名:肖雪  刘云
作者单位:昆明理工大学信息工程与自动化学院,昆明,650500;昆明理工大学信息工程与自动化学院,昆明,650500
基金项目:国家自然科学基金(61761025)
摘    要:多标签分类中如何有效处理具有许多实例和大量标签的大规模数据集、补偿训练集中缺失标签以及利用未标记实例改进预测性能等问题已成为重要研究方向。提出嵌入式多标签分类(EMC)算法,首先从伪实例参数化的高斯过程(GP)中提取两组随机变换来模拟特征向量、潜在空间表示向量和标签向量之间的非线性关系映射,其次引入一组辅助变量结合专家集成(EEOE)方法补偿缺失标签,最后利用未标记实例学习随机函数的平滑映射提高预测性能。仿真结果表明,与特征识别隐式标签空间编码的多标签分类(FaLE)算法和半监督低秩映射多标签分类(SLRM)算法相比,EMC算法优化了处理大规模数据集、补偿缺失标签及利用未标记数据的能力,从而提高了类标签的预测性能,且具有良好的可扩展性,训练时间短。

关 键 词:多标签分类  缺失标签  嵌入式  可扩展性
收稿时间:2019-05-05

Optimization of embedding-based multi-label classification
XIAO Xue,LIU Yun. Optimization of embedding-based multi-label classification[J]. Journal of Beijing University of Chemical Technology, 2019, 46(5): 94-100. DOI: 10.13543/j.bhxbzr.2019.05.014
Authors:XIAO Xue  LIU Yun
Affiliation:School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
Abstract:How to deal effectively with large-scale data sets with many instances and a large number of labels, compensate for missing labels in training sets, and improve prediction performance by using unlabeled instances in multi-label classification has become an important research direction. This paper proposes an embedding-based multi-label classification (EMC) algorithm. Firstly, two sets of random transformations were extracted from the pseudo-instance parameterized Gaussian process (GP) to model the nonlinear relationship mapping between feature vectors, latent space representation vectors and label vectors, and then a set of auxiliary variables combined with an expert ensemble with an overriding expert (EEOE) was introduced. The method compensates for the missing tags, and finally uses the unlabeled instance to learn the smooth mapping of the random function to improve the prediction performance. The simulation results show that compared with the FaLE and SLRM algorithms, the EMC algorithm optimizes the ability to process large data sets, compensate for missing tags, and utilize unlabeled data, thereby improving the predictive performance of class tags, with good scalability and short training time.
Keywords:multi-label classification   missing labels   embedding-based   scalability
本文献已被 万方数据 等数据库收录!
点击此处可从《北京化工大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京化工大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号