首页 | 本学科首页   官方微博 | 高级检索  
     

基于模糊相似度的科技文献软聚类算法
引用本文:孟海涛,陈笑蓉. 基于模糊相似度的科技文献软聚类算法[J]. 贵州大学学报(自然科学版), 2007, 24(2): 175-178
作者姓名:孟海涛  陈笑蓉
作者单位:贵州大学,计算机系,贵州,贵阳,550025;盐城工学院,计算机系,江苏,盐城,224003;贵州大学,计算机系,贵州,贵阳,550025
摘    要:本文提出了一种新的文档软聚类算法。将关键字通过文档的题名、摘要进行映射扩展,并对关键字的出现位置进行加权构造文本向量空间。利用模糊最大支撑树聚类过程中类间和类内相似度变化的规律自动识别最佳聚类数K及硬聚类簇。以硬聚类簇为核心将聚类相似度减小到下相似度进行扩展,从而形成相应软聚类。实验表明该算法能够有效地降低特征维数、提高软聚类精度和速度。

关 键 词:科技文献  特征提取  相似度  软聚类
文章编号:1000-5269(2007)02-0175-04
修稿时间:2006-12-08

Fuzzy similarity based document clustering algorithm
MENG Hai-tao,CHEN Xiao-rong. Fuzzy similarity based document clustering algorithm[J]. Journal of Guizhou University(Natural Science), 2007, 24(2): 175-178
Authors:MENG Hai-tao  CHEN Xiao-rong
Affiliation:1. Department of Computer Science, Guizhou University, Guiyang 550025, China; 2. Department of Computer Science, Yancheng of Institute Tecnology, Jiangsu 224003, China
Abstract:Author presents a new algorithm for Document soft C lustering.Extract keywords from the title and abstract and construct a weighted document vector space accord ing to the position of the keywords.Automatically determ ine the optimal classification numberK and hard cluster by applying the law of sim i-larity-change inside and between classes in the process ofmaximum spanning tree clustering.Centering on hard cluster,decrease the cluster sim ilarity to the m inimum to form the soft clustering.Experimental result ind icates a great drop in feature d imension and an increase in speed and accuracy.
Keywords:Science Documents  feature extraction  sim ilarity measures  soft clustering
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号