首页 | 本学科首页   官方微博 | 高级检索  
     

一种面向不确定数据流的聚类算法
引用本文:韩东红,王坤,邵崇雷,马畅. 一种面向不确定数据流的聚类算法[J]. 东北大学学报(自然科学版), 2016, 37(12): 1677-1682. DOI: 10.12068/j.issn.1005-3026.2016.12.002
作者姓名:韩东红  王坤  邵崇雷  马畅
作者单位:(1. 东北大学 计算机科学与工程学院, 辽宁 沈阳110169; 2. 沈阳理工大学 机械工程学院, 辽宁 沈阳110159)
基金项目:国家自然科学基金资助项目(61173029,61332006,61672144).
摘    要:作为大数据的重要组成,产生于传感器、移动电话设备、社交网络等的不确定流数据因其具有流速可变、规模宏大、单遍扫描及不确定性等特点,传统聚类算法不能满足用户高效实时的查询要求.首先利用MBR(minimum bounding rectangle)描述不确定元组的分布特性,并提出一种基于期望距离的不确定数据流聚类算法,计算期望距离范围的上下界剪枝距离较远的簇以减少计算量;其次针对簇内元组的分布特征提出了簇MBR的概念,提出一种基于空间位置关系的聚类算法,根据不确定元组MBR和簇MBR的空间位置关系排除距离不确定元组较远的簇,从而提高聚类算法效率;最后在合成数据集和真实数据集进行实验,结果验证了所提出算法的有效性和高效性.

关 键 词:不确定数据流  聚类  大数据  数据挖掘  最小边界矩形  

A Cluster Algorithm for Uncertain Data Stream
HAN Dong-hong,WANG Kun,SHAO Chong-lei,MA Chang. A Cluster Algorithm for Uncertain Data Stream[J]. Journal of Northeastern University(Natural Science), 2016, 37(12): 1677-1682. DOI: 10.12068/j.issn.1005-3026.2016.12.002
Authors:HAN Dong-hong  WANG Kun  SHAO Chong-lei  MA Chang
Affiliation:1. School of Computer Science & Engineering, Northeastern University, Shenyang 110169, China; 2. School of Mechanical Engineering, Shenyang Ligong University,Shenyang 110159, China.
Abstract:As an important component of big data generated in the sensor, mobile phone devices, social networks etc., uncertain streaming data have many characteristics, such as variable rate, large-scale, single-pass scanning, and uncertainty. Traditional clustering algorithms cannot meet efficient real-time inquiry requirements for the users. Firstly, MBR (minimum bounding rectangle) was used to describe the distribution characteristics of uncertain tuples. And then, a clustering algorithm based on expected distance was proposed for uncertain data stream. The bounds of expected distance range to filter the clusters with far distance can be calculated. Secondly, cluster MBR concept based on the distribution of the tuples in a cluster was presented. Then, a clustering algorithm was given, which excludes the clusters far from the uncertain tuple by the spatial location relationship between uncertainty tuple MBR and clusters MBR, thereby increasing the efficiency of clustering algorithm. Finally, experiments running on synthetic datasets and real datasets verify that the proposed algorithms are effective and efficient.
Keywords:uncertain data stream  cluster  big data  data mining  MBR (minimum bounding rectangle)  
本文献已被 CNKI 等数据库收录!
点击此处可从《东北大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《东北大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号