一种基于语义距离的高效文本聚类算法 An Efficient Text Clustering Algorithm Based On Semantic Distance期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于语义距离的高效文本聚类算法

引用本文：	冯少荣,肖文俊. 一种基于语义距离的高效文本聚类算法[J]. 华南理工大学学报(自然科学版), 2008, 36(5): 30-37

作者姓名：	冯少荣肖文俊

作者单位：	华南理工大学计算机科学与工程学院,广东广州,510640

摘要：	摘　要：提出了一种基于语义进行文本聚类的新方法。该方法从语义上具体分析文档，利用文档具体语义计算文档间的相似度，使得文档聚类结果更合理。文本聚类主要采用最近邻聚类算法，并提出第二次聚类算法改进最近邻算法对输入次序敏感的问题。类特征词的选择上根据相似度权重优胜略汰类特征词，使得最后类特征词越来越逼近类的主题。实验结果表明本文所提出的算法在聚类精度和召回率上均优于基于VSM的K-Means聚类算法。
关键词：	文本聚类语义距离相似度最近邻聚类
收稿时间：	2007-06-27
修稿时间：	2007-09-03
An Efficient Text Clustering Algorithm Based On Semantic Distance

Feng Shao-rong,Xiao Wen-jun. An Efficient Text Clustering Algorithm Based On Semantic Distance[J]. Journal of South China University of Technology(Natural Science Edition), 2008, 36(5): 30-37

Authors:	Feng Shao-rong Xiao Wen-jun

Abstract:	Abstract: A new text clustering algorithm based on semantic distance was proposed. This way analysis the text from the semantic, use the specific semantic of the text to compute the text similarity, the test proves the result is more reasonable. Our clustering algorithm mainly uses nearest neighbor clustering, and propose the second clustering to improve the weakness of nearest neighbor clustering which is sensitive to the input order of the document. According the similarity weight, we choose some feature words to represent the cluster and the remaining feature words last are similar with the main themes of the cluster. The experiments indicated that the performance of the proposed algorithm is better than the VSM+K-Means algorithm in the clustering precision and recall rate.

Keywords:	text clustering semantic distance similarity nearest neighbor clustering
本文献已被维普万方数据等数据库收录！
	点击此处可从《华南理工大学学报(自然科学版)》浏览原始摘要信息

设为首页 | 免责声明 | 关于勤云 | 加入收藏