基于非对称相似度的文本聚类方法 Text clustering based on asymmetric similarity期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于非对称相似度的文本聚类方法

引用本文：	宋韶旭,李春平.基于非对称相似度的文本聚类方法[J].清华大学学报(自然科学版),2006,46(7):1325-1328.

作者姓名：	宋韶旭李春平

作者单位：	清华大学,软件学院,北京,100084

摘要：	文本聚类具有数据稀疏性的特点,常见的聚类方法采用基于距离的相异度,为了增强文档的区分特征,提出一种基于非对称相似度的方法,来度量文档对象之间的关联。定义了文本对象之间的非对称相似度度量。利用文本非对称相似度矩阵的稀疏特性,采用强连通构件的划分方法对文本对象进行聚类分析。并通过迭代的方法形成聚类结果的概念层次。实验结果表明:非对称相似度比距离相异度具有更高的准确率和更少的执行时间,当聚类结果簇数目达到较小时,准确率提高约为20%。
关键词：	机器学习文字信息处理文本聚类
文章编号：	1000-0054(2006)07-1325-04
修稿时间：	2005年4月8日
Text clustering based on asymmetric similarity

SONG Shaoxu,LI Chunping.Text clustering based on asymmetric similarity[J].Journal of Tsinghua University(Science and Technology),2006,46(7):1325-1328.

Authors:	SONG Shaoxu LI Chunping

Abstract:	Text clustering data sets have sparse data spaces,with existing text clustering methods using distance-based dissimilarity to measure the document similarity.The document discrimination ability can be strengthened by a asymmetric similarity approach for text clustering.The asymmetric similarity is measured by a clustering analysis of the strong components of the sparse matrix.The approach provides a conceptual structure after the hierarchical clustering.Tests on textual data sets show that the asymmetric similarity measure provides higher precision with less run time than the distance-based dissimilarity method.With small numbers of clusters,the accuracy is improved by about 20%.

Keywords:	machine learning text information processing text clustering
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏