首页 | 本学科首页   官方微博 | 高级检索  
     检索      

An enhanced text categorization method based on improved text frequency approach and mutual information algorithm
作者姓名:Pei Zhili  Shi Xiaohu  Maurizio Marchese  Liang Yanchun
作者单位:Department of
基金项目:国家自然科学基金;内蒙古自然科学基金
摘    要:Text categorization plays an important role in data mining. Feature selection is the most important process of text categorization. Focused on feature selection, we present an improved text frequency method for filtering of low frequency features to deal with the data preprocessing, propose an improved mutual information algorithm for feature selection, and develop an improved tf.idf method for characteristic weights evaluation. The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness. Numerical results show that the precision, the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.

关 键 词:text  categorization,  mutual  information,  feature  selection,  characteristic  weights,  classifier.

An enhanced text categorization method based on improved text frequency approach and mutual information algorithm
Pei Zhili,Shi Xiaohu,Maurizio Marchese,Liang Yanchun.An enhanced text categorization method based on improved text frequency approach and mutual information algorithm[J].Progress in Natural Science,2007,17(12):1494-1500.
Authors:Pei Zhili  Shi Xiaohu  Maurizio Marchese  Liang Yanchun
Abstract:Text categorization plays an important role in data mining. Feature selection is the most important process of text categorization. Focused on feature selection, we present an improved text frequency method for filtering of low frequency features to deal with the data preprocessing, propose an improved mutual information algorithm for feature selection, and develop an improved tf.idf method for characteristic weights evaluation. The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness. Numerical results show that the precision, the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.
Keywords:text categorization  mutual information  feature selection  characteristic weights  classifier  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《自然科学进展(英文版)》浏览原始摘要信息
点击此处可从《自然科学进展(英文版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号