首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于离群点检测的K-means算法
引用本文:冷泳林,张清辰,赵亮,鲁富宇.基于离群点检测的K-means算法[J].锦州师范学院学报(自然科学版),2014(1):34-38,48.
作者姓名:冷泳林  张清辰  赵亮  鲁富宇
作者单位:[1]渤海大学高职学院,辽宁锦州121001 [2]大连理工大学软件学院,辽宁大连116621
基金项目:辽宁省科技厅项目(No:2013020014),中国高等职业技术教育研究会规划课题(No:GZYLX2011211),辽宁省教育科学“二五”规划(No:JG12DB211).
摘    要:K-means算法以其简单、快速的特点在现实生活中得到广泛应用。然而传统K-means算法容易受到噪声的影响,导致聚类结果不稳定,聚类精度不高。针对这个问题,提出一种基于离群点检测的K-means算法,首先检测出数据集中的离群点,在选择初始种子的时候,避免选择离群点作为初始种子。然后在对非离群点进行聚类完成后,根据离群点到各个聚类的距离,将离群点划分到相应的聚类中。算法有效降低离群点对K-means算法的影响,提高聚类结果的准确率。实验表明,在聚类类别数给定的前提下,在标准数据集UCI上该算法有效降低离群点对K-means算法的影响,提高了聚类的精确率和稳定性。

关 键 词:聚类  K—means算法  离群点  UCI数据集

K -means Algorithm Based on Outliers Detection
LENG Yong-lin,ZHANG Qing-chen,ZHAO Liang,LU Fu-yu.K -means Algorithm Based on Outliers Detection[J].Journal of Jinzhou Normal College (Natural Science Edition),2014(1):34-38,48.
Authors:LENG Yong-lin  ZHANG Qing-chen  ZHAO Liang  LU Fu-yu
Institution:1. Higer Professional Technical Institute, Bohai University, Jinzhou 1210! 3, China; 2. School of Software Technology, Dalian University of Technology, Dalian 116621, China)
Abstract:K-means algorithm is widely used in real life for its simple and rapid characteristics .However , traditional K-means algorithm is affected by outliers , leading to the instability of the clustering results and low accuracy of the clustering .For this problem , the paper proposes a novel K -means algorithm based on outliers detection .The presented algorithm firstly detects outliers from the given dataset , which can avoid selecting outli-ers as the initial seed .After clustering all the objects which are not outliers , the algorithm allocates every outlier to the corresponding cluster according to distance between the outlier and different clusters .The presented algo-rithm reduces the impact of outliers on traditional K -means algorithm and improves the clustering accuracy .For the given number of categories of the clusters and in the standard UCI data sets ,the experimental results indicate that the algorithm is effective , reduces the influence of outlier on the K -means algorithm , improving the accura-cy and stability of the cluster .
Keywords:clustering  K-means algorithm  outlier  UCI dataset
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号