首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Spark的大规模社交网络社区发现原型系统
引用本文:叶小榕,邵晴.基于Spark的大规模社交网络社区发现原型系统[J].科技导报(北京),2018,36(23):93-101.
作者姓名:叶小榕  邵晴
作者单位:1. 中国科学技术信息研究所, 北京 100038;
2. 北龙中网(北京)科技有限责任公司, 北京 100190
摘    要: 为有效发掘大规模社交网络上的用户信息,提高对用户之间关系的深入了解,设计开发了基于Spark的大规模社交网络社区发现原型系统。系统利用ActiveMQ实现对大量用户数据的抓取,使用基于Spark的MLlib提供的朴素贝叶斯算法对用户数据进行清洗,利用Spark的GraphX提供的PageRank算法和MLlib提供的Z-Score算法计算用户排名,最终应用并优化LPA算法,将特征相近、联系较密切的用户快速地划分到同一社区中,为进一步分析利用社区用户数据打下了基础。

关 键 词:Spark  GraphX  MLlib  社区发现  
收稿时间:2018-10-09

A large scale social networking community detection prototype system based on Spark
YE Xiaorong,SHAO Qing.A large scale social networking community detection prototype system based on Spark[J].Science & Technology Review,2018,36(23):93-101.
Authors:YE Xiaorong  SHAO Qing
Institution:1. Institute of Scientific and Technical Information of China, Beijing 100038, China;
2. KNET Co., Ltd., Beijing 100190, China
Abstract:In order to effectively explore the user information in large-scale social networks and improve the understanding of the relationship between users, a community detection prototype system based on Spark is designed and developed. The ActiveMQ is used to acquire a large amount of the user data, taking advantage of the naive Bayesian algorithm provided by Spark-based MLlib to clean the user data, and using the PageRank algorithm provided by Spark-based GraphX and the Z-Score algorithm provided by MLlib to calculate the user ranking. In the prototype system, the LPA algorithm is finally used and optimized, to group the users of similar features and close ties into the same community quickly, as a foundation for further analysis and utilization of the community user data.
Keywords:Spark  GraphX  MLlib  community detection  
本文献已被 CNKI 等数据库收录!
点击此处可从《科技导报(北京)》浏览原始摘要信息
点击此处可从《科技导报(北京)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号