首页 | 本学科首页   官方微博 | 高级检索  
     检索      

利用概念知识的文本分类
引用本文:丁泽亚,张全.利用概念知识的文本分类[J].应用科学学报,2013,31(2):197-203.
作者姓名:丁泽亚  张全
作者单位:1. 中国科学院声学研究所,北京100190 2. 中国科学院研究生院,北京100039
基金项目:国家“863”高技术研究发展计划基金(No.2012AA011102); 国家语委“十二·五”科研项目基金(No.YB125-53);中科院声学所知识创新工程项目基金(No.Y154141431);中国科学院学部咨询项目基金(No.Y129091211)资助
摘    要:针对统计方法不能从语义理解的角度进行文本分类的问题,提出了利用概念层次网络概念知识进行文本分类的方法,包括两部分:依据概念进行特征选取以及根据类别关联度分类. 在特征选取时,通过计算概念与类别的区分度挖掘出类别核心概念,并采用类别核心概念对特征项进行精选. 依据类别核心概念相关的类别语义信息,提出了文档与类别关联度的计算方法,并根据类别关联度来判断文本类别. 实验表明,该方法可有效降低特征空间维数,在提高分类效率的同时保证了分类效果,F1值略有提高. 与SVM、KNN和Bayes分类器对比,当特征项数目较少时,该方法的F1值明显高于其他3种方法,综合分类效果与SVM相当,优于KNN和Bayes.

关 键 词:文本分类  概念层次网络  概念  概念区分度  类别关联度  
收稿时间:2011-08-26
修稿时间:2012-01-08

Text Categorization Based on Concept Knowledge
DING Ze-ya,ZHANG Quan.Text Categorization Based on Concept Knowledge[J].Journal of Applied Sciences,2013,31(2):197-203.
Authors:DING Ze-ya  ZHANG Quan
Institution:1. Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China; 2. Graduate University of Chinese Academy of Sciences, Beijing 100039, China
Abstract:To achieve semantic understanding, this paper proposes a method for text categorization based on concept-knowledge in the hierarchical network of concepts (HNC). The method includes two parts: feature selection using concepts and text categorization according to category relatedness degree. In this paper, category key concepts are explored by computing discrimination degree of concepts, and used to further reduce dimensionality of the feature space. Based on the category semantic information consisting of category key concepts and relatedness weights, the method of computing relatedness degrees between documents and categories is proposed. The category relatedness degree of document is used as a measure for text categorization. Experiments show that the proposed method can effectively reduce dimensionality of feature space, increase efficiency and ensure effectiveness of text categorization. Compared with SVM, KNN and Bayes, this method is the best in terms of F1 values at higher feature reduction levels. In terms of overall performance, the method is almost equivalent to SVM, and better than KNN and Bayes.
Keywords:concept  concept discrimination  category relatedness  text categorization  hierarchical network of concepts  
点击此处可从《应用科学学报》浏览原始摘要信息
点击此处可从《应用科学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号