首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于信息增益的LDA模型的短文本分类
引用本文:沈竞.基于信息增益的LDA模型的短文本分类[J].重庆文理学院学报(自然科学版),2011,30(6):64-66.
作者姓名:沈竞
作者单位:解放军后勤工程学院图书馆,重庆沙坪坝,401311
摘    要:在基于LDA的短文本分类基础上进行改进,提出信息增益结合LDA的短文本分类方法.该方法采用信息增益计算词汇对于文本分类的贡献度,提高"作用词"的权重,过滤掉"非作用词",最后对过滤后的短文本进行LDA主题建模,并采用中心向量法建立文本类别模型.实验证明,该方法随着作用词比例的减少,分类性能有较大的提高.

关 键 词:信息增益  LDA模型  文本分类

The classification of LDA model essay based on information gain
Abstract:In this paper the classification of short essay was improved based on LDA.The information gain of the essay with LDA classification method was put forward.Using the information gain calculation to calculate the text classification vocabulary contribution,to improve "function word" weight,and to filter out "the function word",at last the passage of the filtered was in the LDA theme modeling,and the center vector method was used to establish the text category model.The experimental results prove that with the reducing of function word ratio,classification performance is distinctly improved in the method.
Keywords:information gain  LDA model  text classification
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号