首页 | 本学科首页   官方微博 | 高级检索  
     检索      

数据挖掘中K-均值聚类算法的缺陷及工作效率改进的实验研究
引用本文:陈晓勇,顾晖,彭志娟.数据挖掘中K-均值聚类算法的缺陷及工作效率改进的实验研究[J].科学技术与工程,2013,13(34).
作者姓名:陈晓勇  顾晖  彭志娟
作者单位:南通大学,南通大学,南通大学
摘    要:K -均值聚类算法在当前提取数据挖掘的聚类分析方法中已经取得了一定的成就,为了进一步改进其在数据预处理及神经网络结构中的应用,文中对算法进行了缺陷研究,主要做了以下几个方面的工作:对K-means算法进行了思路及算法主要流程分析;得出K-均值聚类算法存在简单、迅速、结果簇密集、簇与簇之间区别较为明显等优点;分析得出算法存在与处理符号属性的数据不太适应、必须事先给出k值(想要生成的簇的个数)、对“噪声数据”以及孤立的点数据有较大影响、需要不断计算更新调整后的新聚类中心等缺点。在实验验证中结果得出:聚类结果可知,选取不同的值初始值对聚类结果的影响很小;如果聚类数据集迭代次数较多时,可以尝试着改变其数据的输入顺序;变动数据集的输入顺序,会直接影响聚类结果。实验结果对于K-均值算法的工作效率提高了具有明显的参考价值,这一研究对于数据挖掘技术的改进具有一定的意义。

关 键 词:K  -均值  聚类算法  噪声数据  迭代  工作效率
收稿时间:7/9/2013 12:00:00 AM
修稿时间:8/2/2013 12:00:00 AM

Research of experimental data Mining K-means clustering algorithm to improve the efficiency of defect
CHEN Xiao-yong,GU Hui and PENG Zhi-juan.Research of experimental data Mining K-means clustering algorithm to improve the efficiency of defect[J].Science Technology and Engineering,2013,13(34).
Authors:CHEN Xiao-yong  GU Hui and PENG Zhi-juan
Institution:Nantong University,Nantong University
Abstract:K - means clustering algorithm to extract the current data mining clustering analysis method has achieved some success, in order to further improve their performance in data preprocessing and neural network structure in the application of the text of the algorithm for defect studies the following major aspects of work: The K-means algorithm for ideas and algorithms are mainly process analysis; draw K-means clustering algorithm there is a simple, rapid, result clusters dense clusters and clusters such as the more obvious differences between Benefits; analysis and processing algorithm has obtained data symbol attributes are not accustomed to, which must be given the value of k (number of clusters you want to generate), on the "noise data" as well as isolated point data have a greater impact, you need to constantly updated adjusting the new computing cluster center and other shortcomings. The results obtained in the experimental verification: clustering results, select a different value of the initial value has little effect on the clustering results; clustering data set if the number of iterations is large, you can try to change its data input order; Change Data set the input sequence, it will directly affect the clustering results. The results for the K-means algorithm work efficiency has obvious reference value, this study for the improvement of data mining technology has a certain significance.
Keywords:K - Means  clustering algorithm  noise data  iterative  work efficiency
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号