首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种并行LDA主题模型建立方法研究
引用本文:王旭仁,姚叶鹏,冉春风,何发镁.一种并行LDA主题模型建立方法研究[J].北京理工大学学报,2013,33(6):590-593.
作者姓名:王旭仁  姚叶鹏  冉春风  何发镁
作者单位:首都师范大学信息工程学院,北京,100048;北京理工大学图书馆,北京,100081
基金项目:国家自然科学基金资助项目(61272446);北京市属高等学校人才强教深化计划"中青年骨干人才"资助项目(PHR201008083)
摘    要:针对潜在狄利克雷分析(LDA)模型分析大规模文档集或语料库中潜藏的主题信息计算时间较长问题,提出基于MapReduce架构的并行LDA主题模型建立方法.利用分布式编程模型研究了LDA主题模型建立方法的并行化实现.通过Hadoop并行计算平台进行实验的结果表明,该方法在处理大规模文本时,能获得接近线性的加速比,对主题模型的建立效果也有提高. 

关 键 词:MapReduce架构  并行计算  潜在狄利克雷分布模型  主题建模
收稿时间:1/5/2013 12:00:00 AM

Research on Parallel LDA Topic Modeling Method
WANG Xu-ren,YAO Ye-peng,RAN Chun-feng and HE Fa-mei.Research on Parallel LDA Topic Modeling Method[J].Journal of Beijing Institute of Technology(Natural Science Edition),2013,33(6):590-593.
Authors:WANG Xu-ren  YAO Ye-peng  RAN Chun-feng and HE Fa-mei
Institution:1.Information Engineering College, Capital Normal University, Beijing 100048, China2.Department of Library, Beijing Institute of Technology, Beijing 100081, China
Abstract:The existing latent Dirichlet allocation (LDA) model used to analyze the theme of information hidden in the massive set of documents or corpus has the shortcoming of longer computation time. To overcome such a disadvantage, we propose a parallel LDA topic modeling method based on MapReduce architecture using a distributed programming model, that is, the parallel implementation of the LDA topic model. Experiment has been fulfilled by utilizing the Hadoop parallel computing platform. The results show that, when dealing with large amounts of text, the proposed method can get near-linear speedup and improve the establishing effect of the topic modeling.
Keywords:MapReduce architecture  parallel computing  latent Dirichlet allocation (LDA) model  topic modeling
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京理工大学学报》浏览原始摘要信息
点击此处可从《北京理工大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号