首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于阈值的BIRCH算法改进与分析
引用本文:尚家泽,安葳鹏,郭耀丹.基于阈值的BIRCH算法改进与分析[J].重庆邮电大学学报(自然科学版),2020,32(3):487-494.
作者姓名:尚家泽  安葳鹏  郭耀丹
作者单位:河南理工大学,河南 焦作 454000
基金项目:河南省教育厅应用研究计划项目(16A520052)
摘    要:平衡迭代规约层次聚类(balanced iterative reducing and clustering using hierarchies, BIRCH)算法是一个综合的层次聚类算法。但BIRCH算法为叶子节点中的簇设置统一的空间阈值,根据数据对象与簇之间的距离来决定数据对象的插入位置,从而忽略了簇与簇之间的关系;此外,算法在分裂节点时,选取距离最远的2个聚类特征作为子簇,其他聚类特征会根据与这2个聚类特征之间的距离关系分裂为另外的子簇,造成处于簇与簇之间的样本数据错误分类,这样会忽略聚类特征之间的关系。针对BIRCH算法的这2个问题,提出了基于阈值的自适应算法,用于解决原算法统一空间阈值的问题;并在针对聚类特征关系的问题上,结合朴素贝叶斯算法对原算法进行改进。对改进后BIRCH算法与传统的算法进行仿真实验。结果表明,改进算法在损失效率的情况下,聚类效果得到了明显的改善,并且与其他算法相比,所提算法具有不错的表现性,而且具有跨数据集的鲁棒性。

关 键 词:平衡迭代规约层次聚类(BIRCH)算法  自适应  阈值  贝叶斯算法
收稿时间:2018/12/18 0:00:00
修稿时间:2020/2/25 0:00:00

BIRCH algorithm improvement and analysis based on threshold value
SHANG Jiaze,AN Weipeng,GUO Yaodan.BIRCH algorithm improvement and analysis based on threshold value[J].Journal of Chongqing University of Posts and Telecommunications,2020,32(3):487-494.
Authors:SHANG Jiaze  AN Weipeng  GUO Yaodan
Institution:Henan Polytechnic University, Jiaozuo 454000, P. R. China
Abstract:Balanced iterative reducing and clustering using hierarchies(BIRCH) is a comprehensive and hierarchical clustering algorithm. However, algorithm BIRCH sets a unified space threshold for clusters in leaf nodes, and where it inserts the data is determined by the distance between data and clusters,thus ignoring the relationship between clusters. In addition, when splitting nodes, the algorithm selects two clustering feature with the maximum distance as its sub-clusters,which is used by other clustering to splitting, thus resulting in the wrong classification of sample data between clusters and ignoring the relationship between clustering features. To deal with the two problems of BIRCH algorithm, an adaptive algorithm based on threshold is proposed in order to solve unified space threshold of the original algorithm, and the original algorithm is improved by combining Naive Bayesian algorithm to solve the problem of clustering features. A simulated experiment on the improved BIRCH algorithm and the traditional one shows that the clustering effect of the BIRCH algorithm is obviously improved under the loss of efficiency, and compared with other methods,the proposed method has good performance and is robust across data sets.
Keywords:balanced iterative reducing and clustering using hierarchies (BIRCH) algorithm  self-adaption  threshold  Bayesian algorithm
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号