首页 | 本学科首页   官方微博 | 高级检索  
     检索      

分类属性数据聚类算法HABOS
引用本文:武森,姜丹丹,王蔷.分类属性数据聚类算法HABOS[J].北京科技大学学报,2016(7):1017-1024.
作者姓名:武森  姜丹丹  王蔷
作者单位:北京科技大学东凌经济管理学院,北京,100083
基金项目:国家自然科学基金资助项目(71271027),高等学校博士学科点专项科研基金资助项目(20120006110037)
摘    要:CABOSFV_C是一种针对分类属性高维数据的高效聚类算法,该算法采用集合稀疏差异度进行距离计算,并采用稀疏特征向量实现数据压缩。该算法的聚类效果受集合稀疏差异度上限参数的影响,而该参数的选取没有明确的指导。针对该问题提出基于集合稀疏差异度的启发式分类属性数据层次聚类算法( heuristic hierarchical clustering algorithm of categorical data based on sparse feature dissimilarity,HABOS),该方法从聚结型层次聚类思想的角度出发,在聚类数上限参数的约束下,应用新的内部聚类有效性评价指标( clustering validation index based on sparse feature dissimilarity, CVISFD)进行启发式度量,从而实现对聚类层次的自动选取。 UCI基准数据集的实验结果表明,HABOS有效地提高了聚类准确性和稳定性。

关 键 词:数据挖掘  聚类算法  分类数据  属性

HABOS clustering algorithm for categorical data
WU Sen?,JIANG Dan-dan,WANG Qiang.HABOS clustering algorithm for categorical data[J].Journal of University of Science and Technology Beijing,2016(7):1017-1024.
Authors:WU Sen?  JIANG Dan-dan  WANG Qiang
Abstract:The clustering algorithm based on sparse feature vector for categorical attributes ( CABOSFV_C) is an efficient high-di-mensional clustering method for categorical data. Sparse feature dissimilarity ( SFD) is used to calculate the distance and sparse fea-ture vector is used to achieve data compression. However, CABOSFV_C algorithm is dependent upon SFD upper limit parameter for which there is no guidance for configuration. Aimed at solving the problem that CABOSFV_C algorithm is sensitive to this parameter, a new heuristic hierarchical clustering algorithm of categorical data based on SFD ( HABOS) was proposed in this paper. With the con-straint of the upper limit number of clusters, this algorithm applied agglomerative hierarchical clustering and the new internal clustering validation index based on SFD ( CVISFD) which was used to measure the results heuristically to achieve the best choice of the cluste-ring level. Three UCI benchmark data sets were used to compare the improved algorithm with the traditional ones. The empirical tests show that HABOS increases the clustering accuracy and stability effectively.
Keywords:data mining  clustering algorithms  categorical data  attributes
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号