首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种初值优化的K-均值文档聚类算法
引用本文:陈媛媛,屈志毅,张恒龙,廖绍雯.一种初值优化的K-均值文档聚类算法[J].江西师范大学学报(自然科学版),2008,32(2):206-210.
作者姓名:陈媛媛  屈志毅  张恒龙  廖绍雯
作者单位:1. 兰州大学信息科学与工程学院,甘肃,兰州,730000
2. 西安电子科技大学计算机学院,陕西,西安,710071
摘    要:K-均值算法是文档聚类中常用的一种划分方法.近年来,为提高聚类质量,出现了不少优化初始中心的改进算法.该文在基于密度选择中心点算法的基础上,建立了相似度概率模型辅助密度参数的确定,有效减少了参数选择的盲目性.同时,该文提出一种二分快速确定K值最优解的方法.大量实验结果表明,该方法具有理想的效果.

关 键 词:文档聚类  K-均值  向量空间模型  划分聚类算法
文章编号:1000-5862(2008)02-0206-05
修稿时间:2007年12月1日

K-Means Algorithm for Document Clustering with Optimal Initial Values
CHEN Yuan-yuan,QU Zhi-yi,ZHANG Heng-long,LIAO Shao-wen.K-Means Algorithm for Document Clustering with Optimal Initial Values[J].Journal of Jiangxi Normal University (Natural Sciences Edition),2008,32(2):206-210.
Authors:CHEN Yuan-yuan  QU Zhi-yi  ZHANG Heng-long  LIAO Shao-wen
Abstract:K-means algorithm is a widely used partitioning method in document clustering.Recently many improved algorithms of optimizing initial centers have been presented to improve the clustering quality. In this paper, a model of similarity probability was made to determine the density parameter based on a density-based initial centers search algorithm. This model effectively reduced the blindness of choosing the parameter. Furthermore, this paper proposed a binary search approach to rapidly identify the optimal solution of K.A large number of examples were presented to show that the way had perfect effect.
Keywords:document clustering  K-means  vector space model  partition-based clustering algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号