一种初值优化的K-均值文档聚类算法 K-Means Algorithm for Document Clustering with Optimal Initial Values期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种初值优化的K-均值文档聚类算法

引用本文：	陈媛媛,屈志毅,张恒龙,廖绍雯.一种初值优化的K-均值文档聚类算法[J].江西师范大学学报(自然科学版),2008,32(2):206-210.

作者姓名：	陈媛媛屈志毅张恒龙廖绍雯

作者单位：	1. 兰州大学信息科学与工程学院,甘肃,兰州,730000 2. 西安电子科技大学计算机学院,陕西,西安,710071

摘要：	K-均值算法是文档聚类中常用的一种划分方法.近年来,为提高聚类质量,出现了不少优化初始中心的改进算法.该文在基于密度选择中心点算法的基础上,建立了相似度概率模型辅助密度参数的确定,有效减少了参数选择的盲目性.同时,该文提出一种二分快速确定K值最优解的方法.大量实验结果表明,该方法具有理想的效果.
关键词：	文档聚类 K-均值向量空间模型划分聚类算法
文章编号：	1000-5862(2008)02-0206-05
修稿时间：	2007年12月1日
K-Means Algorithm for Document Clustering with Optimal Initial Values

CHEN Yuan-yuan,QU Zhi-yi,ZHANG Heng-long,LIAO Shao-wen.K-Means Algorithm for Document Clustering with Optimal Initial Values[J].Journal of Jiangxi Normal University (Natural Sciences Edition),2008,32(2):206-210.

Authors:	CHEN Yuan-yuan QU Zhi-yi ZHANG Heng-long LIAO Shao-wen

Abstract:	K-means algorithm is a widely used partitioning method in document clustering.Recently many improved algorithms of optimizing initial centers have been presented to improve the clustering quality. In this paper, a model of similarity probability was made to determine the density parameter based on a density-based initial centers search algorithm. This model effectively reduced the blindness of choosing the parameter. Furthermore, this paper proposed a binary search approach to rapidly identify the optimal solution of K.A large number of examples were presented to show that the way had perfect effect.

Keywords:	document clustering K-means vector space model partition-based clustering algorithm
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏