首页 | 本学科首页   官方微博 | 高级检索  
     

基于谱聚类的词和文档的联合聚类
引用本文:张吉文,陈笑蓉. 基于谱聚类的词和文档的联合聚类[J]. 贵州大学学报(自然科学版), 2014, 31(5): 53-57
作者姓名:张吉文  陈笑蓉
作者单位:贵州大学计算机科学与技术学院,贵州贵阳,550025
摘    要:文档聚类和词聚类都是重要且被充分研究的问题.大多数现有的聚类算法针对文档和词是分别聚类,不是同时的.本文提出文档集作为文档和词间的一个二部图的模型思想,使用这个思想,联合聚类问题可以被看成二部图的分割问题.为了解决图的分割问题,使用一个新的联合谱聚类算法,即使用适度规模的词-文档矩阵的奇异向量产生好的分割结果.谱算法得到一些最佳的性能,表明奇异向量通过连续放松解决图划分的NP难问题.最后通过实验结果验证联合聚类算法在实践中非常有效.

关 键 词:谱聚类  联合聚类  图分割  奇异向量

Combined Clustering Algorithm for Words and Documents Based on Spectral Clustering
ZHANG Ji-wen,CHEN Xiao-rong. Combined Clustering Algorithm for Words and Documents Based on Spectral Clustering[J]. Journal of Guizhou University(Natural Science), 2014, 31(5): 53-57
Authors:ZHANG Ji-wen  CHEN Xiao-rong
Affiliation:(College of Computer Science and Technology, Guizhou University, Guiyang 550025, China)
Abstract:It is an important and well-studied problem for document clustering and word clustering. Most existing clustering algorithms for document and word are respective, but not at the same time. In this paper, we present the idea of modeling that the document collection as a bipartite graph between documents and words, using the ideas, clustering problem can be regarded as two parts graph partition problem. In order to solve the problem of graph segmentation, we use a new combined spectral clustering algorithm, which uses the singular vectors of a moderate scale word-document matrix to produce good segmentation results. Using spectral algorithm to get the best performance, it shows that the singular vector graph has been solved by the continuous relaxation to the NP- complete graph bipartitioning problem. The combined clustering algorithm which is verified by the experimental results is very effective in practice.
Keywords:spectral clustering  combined clustering  graph segmentation  singular vectors
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号