首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于局部结构保持的高维数据半监督深度嵌入聚类算法
引用本文:曹超,李梦利,阳树洪,李春贵.基于局部结构保持的高维数据半监督深度嵌入聚类算法[J].广西科学,2022,29(5):922-929.
作者姓名:曹超  李梦利  阳树洪  李春贵
作者单位:广西科技大学电气电子与计算机科学学院, 广西柳州 545006
基金项目:国家自然科学基金项目 (62061003,62062010),广西自然科学基金项目(2019GXNSFAA245049), 广西科技计划项目(桂科AD19245101)和广西大学生创新创业训练计划项目 (201910594057)资助。
摘    要:聚类是机器学习和数据挖掘中的重要课题。近年来,深度神经网络(Deep Neural Networks,DNN)在各种聚类任务中受到广泛关注。特别是半监督聚类,在大量无监督数据中仅引入少量先验信息即可显著提高聚类性能。然而,这些聚类方法忽略了定义的聚类损失可能破坏特征空间,从而导致非代表性的无意义特征。针对现有半监督深度聚类的特征学习过程中局部结构保持有所欠缺的问题,本文提出一种改进的半监督深度嵌入聚类(Improved Semi-supervised Deep Embedded Clustering,ISDEC)算法,采用欠完备自动编码器在特征表达学习的同时,保持数据的内在局部结构;通过综合聚类损失、成对约束损失和重构损失,对聚类标签分配和特征表达进行联合优化。在包括基因数据在内的若干高维数据集上的实验结果表明,本方法的聚类性能比现有方法更好。

关 键 词:聚类  半监督  深度嵌入  基因  表达学习
收稿时间:2021/11/19 0:00:00
修稿时间:2022/2/10 0:00:00

Semi-supervised Deep Embedded Clustering of High Dimensional Data Based on Local Structure Preservation
CAO Chao,LI Mengli,YANG Shuhong,LI Chungui.Semi-supervised Deep Embedded Clustering of High Dimensional Data Based on Local Structure Preservation[J].Guangxi Sciences,2022,29(5):922-929.
Authors:CAO Chao  LI Mengli  YANG Shuhong  LI Chungui
Institution:School of Electrical Electronics and Computer Science, Guangxi University of Science and Technology, Liuzhou, Guangxi, 545006, China
Abstract:Clustering is an important topic in machine learning and data mining.In recent years,Deep Neural Networks (DNN) have received extensive attention in various clustering tasks.In particular,semi-supervised clustering can significantly improve clustering performance by introducing only a small amount of prior information into a large number of unsupervised data.However,these clustering methods ignore that the defined clustering loss may destroy the feature space,leading to non-representative meaningless features.Aiming at the problem that the existing semi-supervised deep clustering has a lack of local structure preservation in the feature learning process,an Improved Semi-supervised Deep Embedded Clustering Algorithm (ISDEC) is proposed in this article,which uses an under-complete auto-encoder to preserve the inherent local structure of the data while learning the feature expression.The clustering label allocation and feature expression are jointly optimized by combining clustering loss,pairwise constraint loss and reconstruction loss.Experimental results on several high-dimensional datasets including genetic data show that this method achieves better clustering performance than existing methods.
Keywords:clustering  semi-supervised  deep embedding  gene  expressive learning
点击此处可从《广西科学》浏览原始摘要信息
点击此处可从《广西科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号