首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于信息增益的中文网页SVM分类研究
引用本文:潘正才,陈海光.基于信息增益的中文网页SVM分类研究[J].上海师范大学学报(自然科学版),2013,42(3):277-282.
作者姓名:潘正才  陈海光
作者单位:上海师范大学信息与机电工程学院,上海200234
基金项目:上海市教育委员会科研创新项目(09YZ154)
摘    要:针对中文网页文本分类中特征降维方法和传统信息增益方法的缺陷和不足做出优化改进,旨在有效提高文本分类效率和精度.首先,采取词性过滤和同义词归并处理对特征项进行初次特征降维,然后提出改进的信息增益方法对特征项进行特征加权运算,最后采用支持向量机(SVM)分类算法对中文网页进行文本分类.理论分析和实验结果都表明本方法比传统方法具有更好的性能和分类效果.

关 键 词:信息增益方法  词性过滤  同义词归并  特征加权  支持向量机
收稿时间:2013/4/3 0:00:00

Research on Chinese web page SVM classifer based on information gain
PAN Zhengcai and CHEN Haiguang.Research on Chinese web page SVM classifer based on information gain[J].Journal of Shanghai Normal University(Natural Sciences),2013,42(3):277-282.
Authors:PAN Zhengcai and CHEN Haiguang
Institution:(College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China)
Abstract:In order to improve the efficiency and accuracy of text classification, optimization and improvement are made for defects and deficiencies of the feature dimensionality reduction method and traditional information gain method in text classification of Chinese web pages. At first, part-of-speech filtering and synonyms merging processes are taken for the first feature dimension re- duction of feature items. Then, an improved information gain method is proposed for feature weighting computation of feature i- tems. Finally, the classification algorithm of Support Vector Machine (SVM) is used for text classification of Chinese web pages. Both theoretical analysis and experimental results show that this method has better performance and classification results than tra- ditional method.
Keywords:information gain method  part-of-speech filtering  synonyms merging  feature weighting  Support Vector Machine
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《上海师范大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《上海师范大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号