一种基于半监督的大规模数据集聚类算法 A clustering algorithm for scalable datasets based on semi-supervision technology期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种基于半监督的大规模数据集聚类算法

引用本文：	申彦,宋顺林,朱玉全.一种基于半监督的大规模数据集聚类算法[J].南京大学学报(自然科学版),2011(4).

作者姓名：	申彦宋顺林朱玉全

作者单位：	江苏大学计算机科学与通信工程学院;

基金项目：	国家科技支撑计划项目(2010BAI88B00); 江苏省自然科学基金(BK2010331); 博士研究生创新计划(CX10B-016X); 江苏大学高级人才基金(08JDG057)

摘要：	待挖掘数据集规模的不断增长,以往的聚类算法由于需要多次扫描原始数据集而不再适用,现阶段,一遍扫描原始数据集即完成聚类的算法成为了首要的研究目标.但是,现有针对大规模数据集的算法容易受到初始化参数以及原始数据集分布的影响,聚类结果质量不高,并且也不稳定.对此,吸收半监督聚类的思想,提出了基于标记集的半监督一遍扫描K均值算法,该算法利用驻留主存的标记集指导聚类过程,使得聚类效率以及聚类结果的质量得到了进一步的提高.在人工生成数据集以及1998KDD数据集上验证了该算法的有效性.
关键词：	大规模数据集聚类半监督聚类聚类数据压缩数据挖掘 K均值聚类
A clustering algorithm for scalable datasets based on semi-supervision technology

Shen Yan,Song Shun-Lin,Zhu Yu-Quan.A clustering algorithm for scalable datasets based on semi-supervision technology[J].Journal of Nanjing University: Nat Sci Ed,2011(4).

Authors:	Shen Yan Song Shun-Lin Zhu Yu-Quan

Institution:	Shen Yan1,Song Shun-Lin1,Zhu Yu-Quan1 (1.School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang,212013,China)

Abstract:	As the size of datasets to be mined is constantly increasing,traditional clustering algorithms are not suitable anymore for their repeated scanning on the original datasets.Nowadays,clustering algorithms that scan the original scalable datasets just once have become a main target of studies.However,such algorithms for scalable datasets are always affected easily by initial parameters and distribution of original datasets;hence,the quality of results is not only low but also unstable.Therefore,integrating th...

Keywords:	scalable datasets clustering semi-supervised clustering mined data compression data mining kmeans
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏