首页 | 本学科首页   官方微博 | 高级检索  
     

文本特征选择方法的改进算法
引用本文:郭晓冬,姜昱明,费非. 文本特征选择方法的改进算法[J]. 吉林大学学报(信息科学版), 2012, 30(5): 544-548. DOI: 10.3969/j.issn.1671-5896.2012.05.018
作者姓名:郭晓冬  姜昱明  费非
作者单位:1.长春市工程咨询服务中心,长春,130042;2.中山大学物理科学与工程技术学院,广东中山,510006;3.上海交通大学电子信息与电气工程学院,上海,200240
摘    要:传统的互信息特征选择方法受边缘概率的影响较大,可能产生稀有词的概率评估分高于常用词的评估分,从而导致倾向于选择低频词条的现象.为此,在分析了几种传统的特征提取方法基础上,通过引入分散度及平均词频两个参数,将互信息方法与特征的词频相关联,从而使互信息的分类更加准确.实验结果表明,该方法使分类效果更好.

关 键 词:文本分类  特征选择  互信息  

Improved Feature Selection Method
GUO Xiao-dong , JIANG Yu-ming , FEI Fei. Improved Feature Selection Method[J]. Journal of Jilin University:Information Sci Ed, 2012, 30(5): 544-548. DOI: 10.3969/j.issn.1671-5896.2012.05.018
Authors:GUO Xiao-dong    JIANG Yu-ming    FEI Fei
Affiliation:1. Changchun Engineering Consulting Service Center, Changchun 130042, China;2. School of Physics and Engineering, SUN YAT-SEN University, Zhongsh
an 510006, China;3. School of Electronic Information and Electrical Engineering, Shanghai Jiaotong University, Shanghai 200240, China
Abstract:Marginal probability has a greater effect on traditional mutual information feature selection method,which may leads to evaluation of r are words bigger than common words,resulting in selecting low frequency words. In order to improve these insufficiencies,we analyze several traditional featur e extraction methods,associates the mutual information method with characte ristics of word frequency by introducing disparity and average frequency,and in creases the accuracy of mutual information classification Experiment shows that this method makes better classification results.
Keywords:text classification  feature selection  mutual information
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《吉林大学学报(信息科学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(信息科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号