首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于结果修剪法解决文本分类中属性重叠问题
引用本文:李敏,沈翔,邵栋,高阳.基于结果修剪法解决文本分类中属性重叠问题[J].扬州大学学报(自然科学版),2006,9(3):63-66.
作者姓名:李敏  沈翔  邵栋  高阳
作者单位:1. 南京大学,软件学院,南京,210089
2. 南京大学,计算机科学与技术系,南京,210093
基金项目:国家自然科学基金资助项目(60475026)
摘    要:在文本分类中,当两个多属性类别发生属性重叠时,采用传统的文本分类算法m acro F1值仅为45%左右.为了提高文本分类算法的m acro F1值,提出了基于结果修剪的方法.在该方法中,分类器由多个子分类器组成.每个子分类器对应于类别中的一个属性;在每一个阶段中,每一个子分类器将不属于该属性的文本剔出.当所有子分类器运行结束后,留下的文本即属于该分类的文本.实验数据表明,基于结果修剪的文本分类方法在解决属性重叠问题时能够将m acro F1值提高到65%左右.

关 键 词:文本分类  文本挖掘  多属性  属性重叠  结果修剪
文章编号:1007-824X(2006)03-0063-04
收稿时间:2006-01-04
修稿时间:2006-01-04

Result-prune based solution for overlapped attributes confusion in multi-attribute text categorization
LI Min,SHEN Xiang,SHAO Dong,GAO Yang.Result-prune based solution for overlapped attributes confusion in multi-attribute text categorization[J].Journal of Yangzhou University(Natural Science Edition),2006,9(3):63-66.
Authors:LI Min  SHEN Xiang  SHAO Dong  GAO Yang
Abstract:In practical text categorization task,some categories contain more than one attribute.When the attributes are overlapped from any two such categories;the traditional text categorization method results in an unacceptable accuracy.The macro F_1 value is only 45%.In order to improve the macro F_1 value,the result-prune method is proposed.By this method,the classifier is divided into several sub-classifiers.Each sub-classifier matches an attribute.In each phase,the sub-classifier discards the text which doesn't belong to this sub-classifier.When all the texts are filtered by these sub-classifiers,the correct texts are left.The experiment shows that this method can increase the macro F_1 value of such text categorization task to 65%.
Keywords:text categorization  text mining  multi-attribute  overlapped attributes  result prune
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号