一种用于文本聚类的改进k-means算法 An improved k-means algorithm for document clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种用于文本聚类的改进k-means算法

引用本文：	索红光,王玉伟.一种用于文本聚类的改进k-means算法[J].山东大学学报(理学版),2008,43(1):60-64.

作者姓名：	索红光王玉伟

作者单位：	1. 北京理工大学计算机科学技术学院,北京,100081;中国石油大学计算机与通信工程学院,山东,东营,257061 2. 中国石油大学计算机与通信工程学院,山东,东营,257061

摘要：	k-means是目前常用的文本聚类算法,针对其最终搜索的局部极值与全局最优解偏差较大的缺点,采用一种基于局部搜索优化的思想来改进算法,并推导出目标函数的变化公式。根据目标函数值的改变对聚类结果作再次划分后,继续k-means迭代,拓展其搜索范围。理论分析和实验结果表明修改后的算法能有效地提高聚类的质量,且计算复杂度仍与数据集文本总数呈线性变化。
关键词：	文本聚类 k-means 向量空间模型局部迭代
文章编号：	1671-9352（2008）01-0060-05
收稿时间：	2007-09-05
修稿时间：	2007年9月5日
An improved k-means algorithm for document clustering

SUO Hong-guang,WANG Yu-wei.An improved k-means algorithm for document clustering[J].Journal of Shandong University,2008,43(1):60-64.

Authors:	SUO Hong-guang WANG Yu-wei

Institution:	1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China;2. School of Computer ＆ Communication Engineering,China University of Petroleum, Dongying 257061, Shandong, China

Abstract:	The k-means algorithm is a popular method for document clustering, but it often gets stuck at a local maximum far from the optimal solution. A procedure based on local search was used to improve this algorithm. The formula about object function change was also deduced, which can be used to again partition the clustering. This procedure makes appropriate iterations to enlarge the search space. Theory analysis and experimental results show that the improved algorithm efficiently improves k-means clustering and its computation is also linear in the size of document collection.

Keywords:	document clustering k-means vector space model local iteration
本文献已被维普万方数据等数据库收录！
	点击此处可从《山东大学学报(理学版)》浏览原始摘要信息
	点击此处可从《山东大学学报(理学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏