基于模糊相似度的科技文献软聚类算法 Fuzzy similarity based document clustering algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于模糊相似度的科技文献软聚类算法

引用本文：	孟海涛,陈笑蓉. 基于模糊相似度的科技文献软聚类算法[J]. 贵州大学学报(自然科学版), 2007, 24(2): 175-178

作者姓名：	孟海涛陈笑蓉

作者单位：	贵州大学,计算机系,贵州,贵阳,550025;盐城工学院,计算机系,江苏,盐城,224003;贵州大学,计算机系,贵州,贵阳,550025

摘要：	本文提出了一种新的文档软聚类算法。将关键字通过文档的题名、摘要进行映射扩展,并对关键字的出现位置进行加权构造文本向量空间。利用模糊最大支撑树聚类过程中类间和类内相似度变化的规律自动识别最佳聚类数K及硬聚类簇。以硬聚类簇为核心将聚类相似度减小到下相似度进行扩展,从而形成相应软聚类。实验表明该算法能够有效地降低特征维数、提高软聚类精度和速度。
关键词：	科技文献特征提取相似度软聚类
文章编号：	1000-5269（2007）02-0175-04
修稿时间：	2006-12-08
Fuzzy similarity based document clustering algorithm

MENG Hai-tao,CHEN Xiao-rong. Fuzzy similarity based document clustering algorithm[J]. Journal of Guizhou University(Natural Science), 2007, 24(2): 175-178

Authors:	MENG Hai-tao CHEN Xiao-rong

Affiliation:	1. Department of Computer Science, Guizhou University, Guiyang 550025, China; 2. Department of Computer Science, Yancheng of Institute Tecnology, Jiangsu 224003, China

Abstract:	Author presents a new algorithm for Document soft C lustering.Extract keywords from the title and abstract and construct a weighted document vector space accord ing to the position of the keywords.Automatically determ ine the optimal classification numberK and hard cluster by applying the law of sim i-larity-change inside and between classes in the process ofmaximum spanning tree clustering.Centering on hard cluster,decrease the cluster sim ilarity to the m inimum to form the soft clustering.Experimental result ind icates a great drop in feature d imension and an increase in speed and accuracy.

Keywords:	Science Documents feature extraction sim ilarity measures soft clustering
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏