首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进LDA的在线医疗评论主题挖掘
引用本文:高慧颖,刘嘉唯,杨淑昕. 基于改进LDA的在线医疗评论主题挖掘[J]. 北京理工大学学报, 2019, 39(4): 427-434. DOI: 10.15918/j.tbit1001-0645.2019.04.015
作者姓名:高慧颖  刘嘉唯  杨淑昕
作者单位:北京理工大学管理与经济学院,北京,100081;北京理工大学管理与经济学院,北京,100081;北京理工大学管理与经济学院,北京,100081
基金项目:国家自然科学基金资助项目(71572013)
摘    要:
对利用主题模型挖掘医疗服务主题进行了深入研究,针对LDA主题模型用于医疗评论主题挖掘中存在的语义稀疏、共现信息不足等问题,提出一种基于词共现分析与LDA主题模型结合的CO-LDA模型.首先使用词共现分析方法对评论语料库进行分析,得到词共现矩阵.其次利用LDA主题模型对语料评论进行建模表示,挖掘出患者对医疗服务的关注点.基于平均最小JS距离、平均肯德尔等级相关系数τb及平均TF-IDF 3个指标对比CO-LDA模型与传统LDA模型在医疗评论主题挖掘中的应用效果,实验最终表明CO-LDA模型识别主题的一致性和主题质量优于LDA模型.将实验结果与中国《医院评价标准》进行对比,一致性较高,说明基于CO-LDA的在线医疗评论主题挖掘方法的有效性. 

关 键 词:主题抽取  医疗服务  语义稀疏  CO-LDA  词共现分析
收稿时间:2018-04-13

Identifying Topics of Online Healthcare Reviews Based on Improved LDA
GAO Hui-ying,LIU Jia-wei and YANG Shu-xin. Identifying Topics of Online Healthcare Reviews Based on Improved LDA[J]. Journal of Beijing Institute of Technology(Natural Science Edition), 2019, 39(4): 427-434. DOI: 10.15918/j.tbit1001-0645.2019.04.015
Authors:GAO Hui-ying  LIU Jia-wei  YANG Shu-xin
Affiliation:School of Economics and Management, Beijing Institute of Technology, Beijing 100081, China
Abstract:
An in-depth research was conducted on the use of topic models to identify the topics of healthcare services. In view of semantic sparseness and the lack of co-occurrence information in the special extraction of healthcare reviews in the LDA topic model, a CO-LDA model was proposed based on word co-occurrence analysis combined with LDA topic model. Firstly, the word co-occurrence analysis method was used to analyze the corpus of the review and the word co-occurrence matrix was obtained. Secondly, the LDA topic model was used to represent corpus reviews, and then the hierarchical clustering algorithm was used to classify the features. Finally, patients'' focus on healthcare service quality factors was identified. Based on the average minimum JS distance, the average Kendall correlation coefficient and the average TF-IDF, in this paper the CO-LDA model was compared with the traditional LDA model. The experiment finally shows that the recognition theme consistency of CO-LDA model is better than that of the LDA model. Through the comparison of the experimental results with the "Hospital Evaluation Standards" in China, it is found that the consistency of the former was high, which explains the effectiveness of the CO-LDA-based online medical review topic mining method.
Keywords:topic extraction  healthcare service  semantic sparse  CO-latent dirichlet allocation  word co-occurrence analysis
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京理工大学学报》浏览原始摘要信息
点击此处可从《北京理工大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号