首页 | 本学科首页   官方微博 | 高级检索  
     

基于LSI和Rough集的文本分类研究
引用本文:赵顺 迟呈英. 基于LSI和Rough集的文本分类研究[J]. 鞍山科技大学学报, 2005, 28(5): 346-349,355
作者姓名:赵顺 迟呈英
作者单位:[1]鞍山科技大学高等职业技术学院,辽宁鞍山114044 [2]鞍山科技大学计算机科学与工程学院,辽宁鞍山114044
摘    要:针对传统的基于VSM的文本分类算法未能考虑到VSM中各特征向量间相互影响关系,构成VSM的词条集合并不能完全、准确地反映文本的内容,分类精度不是很理想的问题,提出了一种基于LSI和Rough集的文本分类方法.在构造VSM的过程中引入了LSI理论,将语义关系体现在VSM中,从而减少了向量空间的维数,然后再运用粗糙集理论中规则推理方法,建立文本分类的规则库,对于任意一个未知文本,只需要将其条件属性与规则库中的规则进行相似匹配,即可完成分类.实验表明,该方法在文本分类的精度和效率方面比传统的基于VSM的文本分类方法均有10%以上的提高.

关 键 词:LSI Rough集 文本分类
文章编号:1672-4410(2005)05-0346-04
收稿时间:2004-10-15
修稿时间:2004-10-15

Research on text categorization based on LSI and Rough sets
ZHAO Shun,CHI Cheng-ying. Research on text categorization based on LSI and Rough sets[J]. Journal of Anshan University of Science and Technology, 2005, 28(5): 346-349,355
Authors:ZHAO Shun  CHI Cheng-ying
Affiliation:1. School of Computer Science and Engineering, Anshan University of Science and Technology, Anshan 114044, China; 2. School of Higher Vocational Technology, Anshan University of Science and Technology, Anshan 114044, China
Abstract:Pointing to the problems that traditional text categorization based on VSM method fails to consider the interaction of each characteristic vector in the VSM,and the phase sets constituted VSM couldn't completely and accurately express the content of the text and the classification accuracy was not very ideal,a new text categorization based on LSI and rough sets method was proposed.In the process of constructing VSM,the system introduced the theories of LSI,which made semantic relation be incarnated in VSM,and reduced the dimensions of the vector space;and then the rule-reasoning methods of rough sets theory was applied to build up the rule-database of text categorization,and the classification was completed to any unknown text,if its condition attributes matched with rules of the rule-database.The experiment proves that the method is 10% more in precision and efficiency than traditional text categorization based on VSM method.
Keywords:latent semantic indexing   Rough sets   text categorization
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号