首页 | 本学科首页   官方微博 | 高级检索  
     检索      

数据挖掘中一种高效的聚类通用框架研究
引用本文:高芹,陈亚.数据挖掘中一种高效的聚类通用框架研究[J].科学技术与工程,2014,14(16).
作者姓名:高芹  陈亚
作者单位:湖北理工学院计算机学院,武汉大学计算机学院
基金项目:基金中文完整名称(号)资助
摘    要:随着传感器和互联网技术高速发展,数据集的规模激增,但系统的存储和处理能力仍然滞后。针对目前的数据聚类算法所需的测量值数目较多、时间开销大的不足,为了高效地解决大型数据集中的数据聚类问题,提出了一种主动式分层聚类通用框架,通过在小型数据集上重复运行离线聚类算法,既保证了算法性能,又降低了测量值计算复杂度和运行时间复杂度。然后,基于谱聚类算法讨论了本文框架,理论分析结果表明,利用O(n lg2n)个相似性数据可以恢复规模为Ω(lgn)的所有聚类,对包含n个对象的数据集,其运行时间为O(n lg3n)。最后,通过全面的仿真实验,证明了所提框架的其他优异性能。

关 键 词:数据集  聚类  测量值  框架  运行时间
收稿时间:2013/12/6 0:00:00
修稿时间:2014/2/19 0:00:00

Research on An Efficient Clustering General Framework in Data Mining
Gao Qin and Chen Ya.Research on An Efficient Clustering General Framework in Data Mining[J].Science Technology and Engineering,2014,14(16).
Authors:Gao Qin and Chen Ya
Abstract:Advances in sensing technologies and the growth of the internet have resulted in an explosion in the size of datasets, while the storage and processing power continue to lag behind. Aiming at the disadvantages of required the larger number of measurements and spent the more running time at the current data clustering algorithms, in order to efficiently solve problems related to the large datasets, we propose a general framework for active hierarchical clustering that repeatedly runs an off-the-shelf clustering algorithm on small subsets of the data and comes with guarantees on performance, measurement complexity and runtime complexity. We instantiate this framework with the spectral clustering algorithm and provide concrete results on its performance, theoretical analysis results show that, this algorithm recovers all clusters of size using similarities and runs in time for a dataset of n objects. Through extensive experimentation we also demonstrate that this framework is practically alluring.
Keywords:datasets  clustering  measurement  framework  runtime
本文献已被 CNKI 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号