首页 | 本学科首页   官方微博 | 高级检索  
     

球型模糊c均值算法在中文文本聚类中的应用
引用本文:黄钢石,陆建江,张亚非. 球型模糊c均值算法在中文文本聚类中的应用[J]. 系统仿真学报, 2004, 16(3): 516-518
作者姓名:黄钢石  陆建江  张亚非
作者单位:1. 解放军理工大学,南京,210007
2. 解放军理工大学,南京,210007;东南大学计算机科学与工程系,南京,210096
基金项目:国家自然科学基金青年科学基金资助(60303024)
摘    要:一般的聚类算法只能将给定的文本归到一个类,但实际的文本往往属于多个类。提出一种基于球形的模糊c-均值算法的中文文本聚类方法。聚类方法仅考虑文本向量的方向而不考虑文本向量的大小。同时,聚类方法能充分考虑文本隶属于类的程度,并能通过用户给定的阈值将给定的文本归到多个类。实验表明,球形的模糊c-均值算法不仅具有好的聚类精度,而且能找出属于多个类的文本。

关 键 词:中文文本  球形的模糊c-均值算法  聚类  文本挖掘
文章编号:1004-731X(2004)03-0516-03
修稿时间:2003-05-17

Application of Spherical Fuzzy C-means Algorithm in Clustering Chinese Documents
HUANG Gang-shi,LU Jian-jiang,,ZHANG Ya-fei. Application of Spherical Fuzzy C-means Algorithm in Clustering Chinese Documents[J]. Journal of System Simulation, 2004, 16(3): 516-518
Authors:HUANG Gang-shi  LU Jian-jiang    ZHANG Ya-fei
Affiliation:HUANG Gang-shi1,LU Jian-jiang1,2,ZHANG Ya-fei1
Abstract:A given document can only be partitioned into one class by the general clustering algorithms, but one document can fall into several classes in the practice. A clustering algorithm for Chinese documents based on the spherical fuzzy c-means algorithm is presented. This clustering algorithm considers the direction of document vectors, but it does not consider the size of the document vectors. At the same time, the degree to which documents belong to classes can be fully considered in this clustering algorithm, and a given document can be partitioned into several classes by a given user threshold. The experiment shows that the spherical fuzzy c-means algorithm not only has fine accuracy but also can find the documents that belong to several classes.
Keywords:Chinese documents  spherical c-means algorithm  clustering  text mining  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号