首页 | 本学科首页   官方微博 | 高级检索  
     检索      

文本分类中的类别信息特征选择方法
引用本文:余俊英,王明文,盛俊.文本分类中的类别信息特征选择方法[J].山东大学学报(理学版),2006,41(3):10-13,59.
作者姓名:余俊英  王明文  盛俊
作者单位:江西师范大学计算机信息工程学院,江西南昌330022
基金项目:教育部科学技术研究项目;江西省自然科学基金
摘    要:随着网上电子文档的急剧增长,文本分类技术在信息检索中的应用变得日益重要.特征维数增加会使样本统计特性的评估变得更加困难,从而降低分类嚣的泛化能力,出现“过学习”的现象.因此,文档特征的选择和提取是文本分类的必要前提.提出一种基于类别信息的特征选择方法,谊方法在尽量保留文档信息的同时,考虑了文档的类别信息.实验表明,这种方法的分类性能比较好,特别是在微平均指标上,与OCFS以及卡方统计量相比有较大幅度的提高.

关 键 词:特征选择  文本分类  类间分布  类内分布
文章编号:1671-9352(2006)03-0010-04
收稿时间:2006-03-29
修稿时间:2006-03-29

Class information feature selection method for text classification
YU Jun-ying,WANG Ming-wen,SHENG Jun.Class information feature selection method for text classification[J].Journal of Shandong University,2006,41(3):10-13,59.
Authors:YU Jun-ying  WANG Ming-wen  SHENG Jun
Institution:College of Computer Information Engineering, Jiangxi Normal Univ., Nanchang 330022, Jiangxi, China
Abstract:With the explosion of web documents,text classification becomes more important in Information Retrieval applications.It is very difficult to evaluate the statistical characteristics of samples because of the high dimensions.It will lead to "over study" and reduce classifiers' performance.So that feature selection and extraction before analysis are necessary.A class information feature selection method is proposed,in which the class information of the training document is taken into account while keeping as much document information as possible.The experiments show that this method can get good performance,and it is consistently better than OCFS and CHI on macro average F_1.
Keywords:feature selection  text classification  distribution between class  distribution within class
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号