球型模糊c均值算法在中文文本聚类中的应用 Application of Spherical Fuzzy C-means Algorithm in Clustering Chinese Documents期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

球型模糊c均值算法在中文文本聚类中的应用

引用本文：	黄钢石,陆建江,张亚非. 球型模糊c均值算法在中文文本聚类中的应用[J]. 系统仿真学报, 2004, 16(3): 516-518

作者姓名：	黄钢石陆建江张亚非

作者单位：	1. 解放军理工大学,南京,210007 2. 解放军理工大学,南京,210007;东南大学计算机科学与工程系,南京,210096

基金项目：	国家自然科学基金青年科学基金资助(60303024)

摘要：	一般的聚类算法只能将给定的文本归到一个类,但实际的文本往往属于多个类。提出一种基于球形的模糊c-均值算法的中文文本聚类方法。聚类方法仅考虑文本向量的方向而不考虑文本向量的大小。同时,聚类方法能充分考虑文本隶属于类的程度,并能通过用户给定的阈值将给定的文本归到多个类。实验表明,球形的模糊c-均值算法不仅具有好的聚类精度,而且能找出属于多个类的文本。
关键词：	中文文本球形的模糊c-均值算法聚类文本挖掘
文章编号：	1004-731X(2004)03-0516-03
修稿时间：	2003-05-17
Application of Spherical Fuzzy C-means Algorithm in Clustering Chinese Documents

HUANG Gang-shi,LU Jian-jiang,,ZHANG Ya-fei. Application of Spherical Fuzzy C-means Algorithm in Clustering Chinese Documents[J]. Journal of System Simulation, 2004, 16(3): 516-518

Authors:	HUANG Gang-shi LU Jian-jiang ZHANG Ya-fei

Affiliation:	HUANG Gang-shi1,LU Jian-jiang1,2,ZHANG Ya-fei1

Abstract:	A given document can only be partitioned into one class by the general clustering algorithms, but one document can fall into several classes in the practice. A clustering algorithm for Chinese documents based on the spherical fuzzy c-means algorithm is presented. This clustering algorithm considers the direction of document vectors, but it does not consider the size of the document vectors. At the same time, the degree to which documents belong to classes can be fully considered in this clustering algorithm, and a given document can be partitioned into several classes by a given user threshold. The experiment shows that the spherical fuzzy c-means algorithm not only has fine accuracy but also can find the documents that belong to several classes.

Keywords:	Chinese documents spherical c-means algorithm clustering text mining
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏