首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于均值密度中心估计的k-means聚类文本挖掘方法
引用本文:符保龙,张爱科.基于均值密度中心估计的k-means聚类文本挖掘方法[J].重庆邮电大学学报(自然科学版),2014,26(1):111-116.
作者姓名:符保龙  张爱科
作者单位:柳州职业技术学院,广西 柳州,545006;柳州职业技术学院,广西 柳州,545006
基金项目:广西教育厅科研项目基金资助(201106LX745,201204LX593)
摘    要:文本挖掘作为数据挖掘的重要研究领域,是检索有用文本信息的重要手段。通过对K-means聚类挖掘方法的基本原理和实现步骤的分析,发现随机选择聚类中心迭代初值、奇异点问题是制约其发展的技术瓶颈,针对该方法的不足,提出了一种基于均值密度中心估计的K-means聚类文本挖掘方法,采用基于均值密度的聚类中心初值估算取代原有方法的随机选取模式,设计自适应的邻域形状选择机制,用均值密度配合阈值消除奇异点。实验结果表明,提出的方法提高了K-means聚类方法的文本挖掘性能,使得文本挖掘查准率得到很大的提高,不仅强于一般K-means均值聚类方法,且和新近流行的自组织神经网络聚类方法相比也具有一定的优势。

关 键 词:数据挖掘  文本挖掘  均值密度  聚类中心  奇异点
收稿时间:6/6/2013 12:00:00 AM
修稿时间:2013/12/15 0:00:00

K-means clustering text mining method using center estimation based on mean density
FU Baolong and ZHANG Aike.K-means clustering text mining method using center estimation based on mean density[J].Journal of Chongqing University of Posts and Telecommunications,2014,26(1):111-116.
Authors:FU Baolong and ZHANG Aike
Institution:Liuzhou Vocational Technological College,Liuzhou 545006,P.R.China;Liuzhou Vocational Technological College,Liuzhou 545006,P.R.China
Abstract:As an important research field in data mining, text mining is an important means to retrieve useful information. Based on the analysis of the basic principle and implementation steps of the K-means clustering mining method, randomly selected iterative initial value is found; the singularity problem is a bottleneck restricting the development of technology. In this paper, K-means clustering method is improved based on center estimation of mean density according to some problems of traditional method. First, random selection mode of initial estimation of original clustering center is replaced by using mean density. Second, an adaptive selection mechanism is designed for the neighborhood of clustering center. Third, singularity point is eliminated by using average density with threshold. Experimental results show that these measures improve text mining performance of K means clustering method and obtain a higher precision. It is not only better than general K-means clustering method, but also has a certain degree of the advantage of the newly popular self-organizing neural network clustering method.
Keywords:data mining  text mining  mean density  clustering center  singular point
本文献已被 CNKI 等数据库收录!
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号