首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于广义线性模型的混合属性数据聚类方法
引用本文:朱永杰.基于广义线性模型的混合属性数据聚类方法[J].科学技术与工程,2021,21(4):1448-1453.
作者姓名:朱永杰
作者单位:许昌学院信息化管理中心,许昌461099
基金项目:许昌学院科技处重点课题:基于深度学习的课堂学生目标检测,编号:2019044第一作者:朱永杰(1981.6—),男,汉,河南许昌,硕士,中级实验师。研究方向:计算机网络,信息安全。E-mail:charles_131@126.com。*通信作者:朱永杰(1981.6—),男,汉,河南许昌,硕士,中级实验师。研究方向:计算机网络,信息安全。E-mail:charles_131@126.com。 (Information Management Center, Xuchang University, Xuchang, 461000, China)
摘    要:针对混合属性数据聚类难度高的问题,提出一种基于广义线性模型的混合属性数据聚类方法.首先,构建低阶多元广义线性模型处理海量数据聚类问题,考虑数据属性的时间特性,获取属性时间序列矩阵;然后,基于优化K-prototypes聚类方法处理混合属性数据时,考虑属性的时间序列矩阵;最后,在考虑样本同聚类中心距离基础上兼顾已知样本信息内容,采用优化方法计算数据相异度、样本与聚类集间距离,当聚类结果趋于平稳时终止运算,输出聚类结果.为验证基于广义线性模型的混合属性数据聚类方法的有效性展开实验分析.结果显示,该方法经过较少次迭代即可优化划分混合属性数据聚类集,聚类适应度值为0.88~0.94,适应度优,可准确体现样本间差异,是一种准确度高的混合属性数据聚类方法.

关 键 词:广义线性模型  混合属性  数据  时间序列矩阵  K-prototypes聚类  迭代
收稿时间:2020/4/23 0:00:00
修稿时间:2020/7/20 0:00:00

Research on Mixed Attribute Data Clustering Method based on Generalized Linear Model
Zhu Yongjie.Research on Mixed Attribute Data Clustering Method based on Generalized Linear Model[J].Science Technology and Engineering,2021,21(4):1448-1453.
Authors:Zhu Yongjie
Institution:Information Management Center,Xuchang University,Xuchang,461000;China
Abstract:To solve the problem of high difficulty in data clustering of mixed attributes, this paper proposes a method of data clustering of mixed attributes based on generalized linear model. Firstly, a low-order multivariate generalized linear model is constructed to deal with the problem of massive data clustering. For these prototypes, the time series matrix of the attributes was considered. Finally, considering the distance between samples and the clustering center, the known sample information content is taken into account. The data dissimilarity and the distance between samples and clustering sets are calculated by using the optimization method. When the clustering results tend to be stable, the operation is terminated and the clustering results are output. To test based on generalized linear model of mixed attribute data clustering method about the effectiveness of experimental analysis, the results showed that the method with less iteration optimization can be divided mix attribute data clustering set, clustering fitness values between 0.88 ~ 0.94, fitness, and this method can accurately reflect differences between samples, is a kind of mixed attribute data clustering method of high accuracy.
Keywords:generalized linear model      mixed attributes      data      time series matrix      K-prototypes clustering      iteration
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号