首页 | 本学科首页   官方微博 | 高级检索  
     

基于社区森林模型的分布式重叠社区发现算法
引用本文:张 妍,刘 滨,梅 卫,许云峰,谷利东,于彭帅,石 钰,魏西峰. 基于社区森林模型的分布式重叠社区发现算法[J]. 河北科技大学学报, 2022, 43(2): 194-203. DOI: 10.7535/hbkd.2022yx02009
作者姓名:张 妍  刘 滨  梅 卫  许云峰  谷利东  于彭帅  石 钰  魏西峰
作者单位:河北科技大学信息科学与工程学院,河北石家庄 050018,河北科技大学经济管理学院,河北石家庄 050018;河北科技大学大数据与社会计算研究中心,河北石家庄 050018,中国人民解放军陆军工程大学石家庄校区,河北石家庄 050005,河北科技大学信息科学与工程学院,河北石家庄 050018;河北科技大学大数据与社会计算研究中心,河北石家庄 050018
基金项目:国家文化和旅游科技创新工程项目(2020年度); 河北省省级科技计划资助项目(20310802D,21310101D)
摘    要:重叠社区发现是复杂网络挖掘中的重要基础工作,可以应用于社交网络、通讯网络、蛋白质相互作用网络、代谢路径网络、交通网络等多种网络的数据分析,从而服务智慧交通、传染病防治、舆情分析、新药研制和人力资源管理等领域.传统的单机运算架构已经难以满足各类大规模复杂网络的分析和计算要求.人工智能领域的研究人员提出将社区发现应用到网络...

关 键 词:分布式处理系统  社交网络  重叠社区  社区森林模型  社区发现
收稿时间:2021-11-09
修稿时间:2021-12-21

Distributed overlapping community discovery algorithm based on community forest model
[JP,LIU Bin,MEI Wei,XU Yunfeng,GU Lidong,YU Pengshuai,SHI Yu,WEI Xifeng. Distributed overlapping community discovery algorithm based on community forest model[J]. Journal of Hebei University of Science and Technology, 2022, 43(2): 194-203. DOI: 10.7535/hbkd.2022yx02009
Authors:[JP  LIU Bin  MEI Wei  XU Yunfeng  GU Lidong  YU Pengshuai  SHI Yu  WEI Xifeng
Abstract:Overlapping community discovery is an important basic work in complex network mining.It can be applied to the data analysis of social networks,communication networks,protein interaction networks,metabolic path networks,transportation networks and other networks,so as to serve the fields of intelligent transportation,infectious disease prevention and control,public opinion analysis,new drug development and human resource management.The traditional stand-alone computing architecture has been difficult to meet the analysis and computing requirements of various large-scale complex networks.Researchers in the field of artificial intelligence propose to apply community discovery to the field of network representation learning to enrich the characteristics of nodes and edges in the network.However,the traditional overlapping community discovery algorithm fails to consider the relevant requirements from the network representation learning task in its design,only focuses on the community division of nodes,and lacks consideration of the internal structure and external boundary of the community.For example,it does not involve the weight of nodes within the community and the attribution ranking belonging to multiple communities,so it cannot provide richer characteristic information of nodes and communities in the network,resulting in insufficient support for network representation learning tasks. Aiming at the problem that the traditional single machine overlapping community discovery algorithm is not suitable for large-scale complex network mining and cannot support the relevant requirements of network representation learning tasks,a distributed overlapping community discovery algorithm based on community forest model (DCFM algorithm) was proposed.Firstly,the network dataset was stored in the distributed file system,the data were divided into blocks,and the distributed computing framework was used to execute the CFM algorithm on each data block;then,the community consolidation was performed;Finally,the community division results were summarized,and the algorithm was run on the spark cluster by using the real DBLP dataset and was evaluated by F Value and running time.The results show that the f-means of DCFM algorithm is slightly inferior to that of CFM algorithm,but its operation time decreases linearly with the increase of nodes.While sacrificing a small part of f-means,DCFM algorithm has the ability to process large-scale network data;the number of split copies has a great impact on the calculation time,which can be found in com DBLP ungraph.Txt data set,CFM algorithm needs 192 min to process data,while DCFM algorithm needs about 91 min to divide the data into 6 parts,and only about 13 min after dividing into 100 parts.Therefore,on the big data platform,DCFM algorithm uses distributed computing backbone to divide and merge communities,which is a feasible large-scale complex network mining method.By dividing the network,it can greatly improve the speed of community division and the efficiency of community discovery.
Keywords:distributed processing system   social networks   overlapping communities   community forest model   commu-nity discovery
本文献已被 万方数据 等数据库收录!
点击此处可从《河北科技大学学报》浏览原始摘要信息
点击此处可从《河北科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号