首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于改进TF-IDF函数的文本分类方法
引用本文:卢中宁,张保威.一种基于改进TF-IDF函数的文本分类方法[J].河南师范大学学报(自然科学版),2012,40(6):158-160,174.
作者姓名:卢中宁  张保威
作者单位:1. 郑州轻工业学院软件学院,郑州,450002
2. 郑州轻工业学院计算机与通信工程学院,郑州,450002
基金项目:国家自然科学基金,浙江省教育厅项目
摘    要:为了解决传统TF-IDF函数由于忽略特征项之间的联系带来的诸多问题,对TF-IDF函数在文本分类中的应用进行了研究.结合信息论相关知识,找出了类间分布度和类内分布度表征特征项之间的潜在关系,进而提出改进的TF-IDF函数用于文本分类.实验表明,改进后的TF-IDF函数是有效可行的,而且较好的弥补了传统方法所丢失的特征项之间的关联信息,提高了文本分类的准确率.

关 键 词:VSM  TF-IDF函数  权重  文本分类

A Text Categorization Method Based on Improved TF-IDF Function
LU Zhong-ning , ZHANG Bao-wei.A Text Categorization Method Based on Improved TF-IDF Function[J].Journal of Henan Normal University(Natural Science),2012,40(6):158-160,174.
Authors:LU Zhong-ning  ZHANG Bao-wei
Institution:b(a.College of Soft;b.School of Computer and Communication,Zhengzhou University of Light Industry,Zhengzhou 450002,China)
Abstract:In order to solve the problems brought by neglecting the relationship between different terms,the application of TF-IDF function in text categorization is researched in this paper.Combining the relative knowledge of information theory,find the latent relationship between the distribution information among classes and distribution information inside a class.And then an improved TF-IDF function is proposed which uses the distribution information among classes and distribution information inside a class.The experiment shows that the improved method is feasible and effective.In addition,it greatly improves the accuracy of text Categorization.
Keywords:Vector Space Model(VSM)  TF-IDF Function  weighting  text categorization
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号