首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于DF算法改进的文本聚类特征选择算法
引用本文:樊东辉,王治和,陈建华,许虎寅.基于DF算法改进的文本聚类特征选择算法[J].甘肃联合大学学报(自然科学版),2012(1):51-54.
作者姓名:樊东辉  王治和  陈建华  许虎寅
作者单位:西北师范大学数学与信息科学学院;河南驻马店职业技术学院
摘    要:通过研究文本特征选取中权重的计算问题,提出了一种利用特征词的熵函数加权的权值的计算方法,不但考察了特征词的文档频数,而且考察了它们在文档中出现的次数,使选出的特征子集更具有较好的代表性.实验表明,改进后的算法对聚类结果有了一定的改进.

关 键 词:特征选择  文档频  词频

Improved Feature Selection Algorithm based on DF Algorithm for Text Clustering
FAN Dong-hui,WANG Zhi-he,CHEN Jian-hua,XU Hu-yin.Improved Feature Selection Algorithm based on DF Algorithm for Text Clustering[J].Journal of Gansu Lianhe University :Natural Sciences,2012(1):51-54.
Authors:FAN Dong-hui  WANG Zhi-he  CHEN Jian-hua  XU Hu-yin
Institution:1(1.School of Mathematics and Information Science,Northwest Normal University,Lanzhou 730070,China; 2.Zhumadian Vocational and Technical College,Zhumadian 463000,China)
Abstract:By studying the text feature selection in the weight calculation problem,a calculation method of the word entropy weighted was proposed.Not only examines the characteristics of the document frequency,but also examines them in a document the number of occurrences.This selected feature subset is more good representation.Experiments show that the improved algorithm for clustering results have certain improvements.
Keywords:feature selection  document frequency  word frequency
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号