首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于语义距离的高效文本聚类算法
引用本文:冯少荣,肖文俊.一种基于语义距离的高效文本聚类算法[J].华南理工大学学报(自然科学版),2008,36(5):30-37.
作者姓名:冯少荣  肖文俊
作者单位:华南理工大学计算机科学与工程学院,广东广州,510640
摘    要:摘 要:提出了一种基于语义进行文本聚类的新方法。该方法从语义上具体分析文档,利用文档具体语义计算文档间的相似度,使得文档聚类结果更合理。文本聚类主要采用最近邻聚类算法,并提出第二次聚类算法改进最近邻算法对输入次序敏感的问题。类特征词的选择上根据相似度权重优胜略汰类特征词,使得最后类特征词越来越逼近类的主题。实验结果表明本文所提出的算法在聚类精度和召回率上均优于基于VSM的K-Means聚类算法。

关 键 词:文本聚类  语义距离  相似度  最近邻聚类  
收稿时间:2007-6-27
修稿时间:2007-9-3

An Efficient Text Clustering Algorithm Based On Semantic Distance
Feng Shao-rong,Xiao Wen-jun.An Efficient Text Clustering Algorithm Based On Semantic Distance[J].Journal of South China University of Technology(Natural Science Edition),2008,36(5):30-37.
Authors:Feng Shao-rong  Xiao Wen-jun
Abstract:Abstract: A new text clustering algorithm based on semantic distance was proposed. This way analysis the text from the semantic, use the specific semantic of the text to compute the text similarity, the test proves the result is more reasonable. Our clustering algorithm mainly uses nearest neighbor clustering, and propose the second clustering to improve the weakness of nearest neighbor clustering which is sensitive to the input order of the document. According the similarity weight, we choose some feature words to represent the cluster and the remaining feature words last are similar with the main themes of the cluster. The experiments indicated that the performance of the proposed algorithm is better than the VSM+K-Means algorithm in the clustering precision and recall rate.
Keywords:text clustering  semantic distance  similarity  nearest neighbor clustering
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《华南理工大学学报(自然科学版)》浏览原始摘要信息
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号