首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于PL-LDA模型的主题文本网络构建方法
引用本文:张志远,霍纬纲.一种基于PL-LDA模型的主题文本网络构建方法[J].复杂系统与复杂性科学,2017,14(1).
作者姓名:张志远  霍纬纲
作者单位:1. 中国民航大学计算机科学与技术学院,天津300300;南京航空航天大学计算机科学与技术学院, 南京210016;2. 中国民航大学计算机科学与技术学院,天津,300300
摘    要:Labeled LDA能挖掘出给定主题下的单词概率分布,但却无法分析主题词之间的关联关系。采用PMI虽可计算两个单词的相互关系,但却和给定主题失去联系。受PMI在窗口中统计词对共现频率的启发,提出了一种PL-LDA(Pointwise Labeled LDA)主题模型,可计算给定主题下词对的联合概率分布,在航空安全报告数据集上的实验表明PL-LDA模型所得结果具有很好的解释性。利用PL-LDA构建了主题文本网络,该网络除能反映主题词分布外,还可展现它们之间的复杂关联关系。

关 键 词:主题模型  文本挖掘  复杂网络  PMI

A Topic Text Network Construction Method Based on PL-LDA Model
Authors:ZHANG Zhiyuan  HUO Weigang
Abstract:Labeled LDA can mine words' probabilities under a given topic, however, it can't analyze the association relationships among these topic words.Although the correlation between word pairs can be calculated by utilizing PMI (Pointwise Mutual Information), their relationship to the given topic is lost.Motivated by the operation of counting word pairs in a fixed window used in PMI, this paper proposes a topic model called PL-LDA (Pointwise Labeled LDA), which can compute the joint probabilities between word pairs under a given topic.Experimental results on aviation safety reports show that this model achieves results with good interpretability.Based on the results of PL-LDA, this paper constructs a topic text network, which provides rich and effective information for analyzers including reflecting the distribution of topic words and displaying the complex relationships among them.
Keywords:topic mode  text mining  complex network  PMI
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号