首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于 DBIRCH 算法的 Argo 剖面数据聚类
引用本文:邬满,张万桢,孙苗,林森.基于 DBIRCH 算法的 Argo 剖面数据聚类[J].吉林大学学报(信息科学版),2008,38(5):568-577.
作者姓名:邬满  张万桢  孙苗  林森
作者单位:1. 广西壮族自治区海洋研究院 信息科, 南宁 530022; 2. 自然资源部 海洋信息技术创新中心, 天津 300171;3. 桂林航天工业学院 实践教学部, 广西 桂林 541004; 4. 桂林电子科技大学 计算机与信息安全学院, 广西 桂林 541004
基金项目:自然资源部海洋信息技术创新中心 2019 年度开放基金资助项目; 国家自然科学基金资助项目(61763007; 61866007);广西科技重大专项基金资助项目(桂科 AA18118025)
摘    要:为解决实时分析处理的海洋 Argo 浮标剖面观测数据特有的数据密度较高、快速响应且需要识别任意形状簇等问题, 提出了一种可通过单次扫描数据集进行有效处理的低复杂度聚类算法 DBIRCH( Density-BasedBalanced Iterative Reducing and Clustering Using Hierarchies)。 该算法通过使用新引入的参数密度阈值修正因子,动态的更新限制 CF(Clustering Feature)树生长的约束系数子空间阈值, 同时结合密度关联思想在不同邻域内多次建立 CF 树且合并, 最终以核心 CF 树子节点为聚类结果输出, 避免了 BIRCH(Balanced Iterative Reducing and Clustering Using Hierarchies)算法对参数的过度依赖, 同时因能处理任意形状簇从而提升了数据处理的整体鲁棒性, 提高了处理 Argo 剖面监测数据的时效性和算法的整体吞吐速度。 为测试算法的综合性能, 使用真实 Argo浮标剖面实时监测数据集, 并根据不同的参数对算法做出多组对比实验, 同时使用不同评价指标对算法从运行时间和聚类准确率上进行综合评估, 从全局角度分析该算法在 DBSCAN(Density-Based Spatial Clustering of Applications with Noise)、 BIRCH 及 DBIRCH 3 种不同算法中综合聚类性能最优。 实验结果表明, 在3 种算法中,BIRCH 算法运算速度最快, 但准确率最低; DBSCAN 算法聚类性能高于 BIRCH 算法, 但运算速度最慢; 改进的DBIRCH 算法运算效率略低于 BIRCH 算法, 但聚类准确率最高。

关 键 词:Argo  浮标  聚类分析  BIRCH  算法  DBSCAN  算法  DBIRCH  算法  />  
收稿时间:2020-04-04

Argo Profile Data Clustering Based on DBIRCH Algorithm
WU Man,ZHANG Wanzhen,SUN Miao,LIN Sen.Argo Profile Data Clustering Based on DBIRCH Algorithm[J].Journal of Jilin University:Information Sci Ed,2008,38(5):568-577.
Authors:WU Man  ZHANG Wanzhen  SUN Miao  LIN Sen
Institution:1. Information Department, Guangxi Academy of Oceangraphy, Nanning 530022, China;2. Technology Innovation Center of Marine Information, Ministry of Natural Resources, Tianjin 300171, China;3. Institute of Geography and Oceanography, Guilin University of Aerospace Technology, Guilin 541004, China;4. School of Computer and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
Abstract:In order to solve the problems of high data density, short response time and the need to identify clusters of arbitrary shapes which are unique to the observed data of marine Argo buoy profiles that need real-time analysis and processing, this paper proposes a low-complexity clustering method which can effectively process data set in a single scan. The algorithm DBIRCH(Density-Based Balanced Iterative Reducing and Clustering Using Hierarchies), by using the new parameter density threshold correction factor, dynamically updates the constraint coefficient subspace threshold which restricts the growth of CF(Clustering Feature) tree. It combines the idea of density correlation to establish CF tree in different neighborhoods and merges several times. Finally,the core CF tree sub-nodes are used as the output of clustering results, which avoids the excessive dependence ofBIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies) algorithm on parameters and can deal with arbitrary shape clusters. It improves the robustness of data processing, the timeliness of processing Argo profile monitoring data and the overall throughput speed of the algorithm. In order to measure the comprehensive performance of the algorithm, real-time monitoring data sets of Argo buoy profiles are used to make multi-group comparative experiments on the algorithm according to different parameters. And different evaluation indexes are used to evaluate the algorithm comprehensively in terms of running time and clustering accuracy. From a global point of view, the algorithm has the best comprehensive clustering performance among three different algorithms:DBSCAN ( Density-Based Spatial Clustering of Applications with Noise ), BIRCH and DBIRCH. The experimental results show that among the three algorithms, birch algorithm has the fastest operation speed, but the lowest accuracy; DBSCAN algorithm has better clustering performance than birch algorithm, but the operation speed is the slowest; the improved dbirch algorithm is slightly lower than birch algorithm, but the clustering accuracy is the highest.
Keywords:argo buoy  cluster analysis  balanced iterative reducing and clustering using hierarchies (BIRCH)algorithm  density-based spatial clustering of applications with noise ( DBSCAN) algorithm  density-based balanced iterative reducing and clustering using hierarchies (DBIRCH) algorithm  
点击此处可从《吉林大学学报(信息科学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(信息科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号