首页 | 本学科首页   官方微博 | 高级检索  
     检索      

A New Feature Selection Method for Text Clustering
作者姓名:XU  Junling  XU  Baowen  ZHANG  Weifeng  CUI  Zifeng  ZHANG  Wei
作者单位:[1]School of Computer Science and Engineering, Southeast University, Nanjing 210096, Jiangsu, China [2]State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, Hubei, China [3]Department of Computer Science and Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, Jiangsu, China
基金项目:Supported by the National Natural Science Foundation of China (60503020, 60373066), the 0utstanding Young Scientist's Fund (60425206), the Natural Science Foundation of Jiangsu Province (BK2005060) and the 0pening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow University
摘    要:Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin's index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the Davies-Bouldin's index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method.

关 键 词:文本聚类  特征选择  数据预处理  网络
文章编号:1007-1202(2007)05-0912-05
修稿时间:2007-02-18

A new feature selection method for text clustering
XU Junling XU Baowen ZHANG Weifeng CUI Zifeng ZHANG Wei.A New Feature Selection Method for Text Clustering[J].Wuhan University Journal of Natural Sciences,2007,12(5):912-916.
Authors:Xu Junling  Xu Baowen  Zhang Weifeng  Cui Zifeng  Zhang Wei
Institution:(1) School of Computer Science and Engineering, Southeast University, Nanjing, 210096, Jiangsu, China;(2) State Key Laboratory of Software Engineering, Wuhan University, Wuhan, 430072, Hubei, China;(3) Department of Computer Science and Engineering, Nanjing University of Posts and Telecommunications, Nanjing, 210003, Jiangsu, China
Abstract:Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin’s index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the Davies-Bouldin’s index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method. Biography: XU Junling(1984–), male, Ph.D. candidate, research direction: statistical pattern recognition, machine learning and data mining.
Keywords:feature selection  text clustering  unsupervised learning  data preprocessing
本文献已被 维普 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号