首页 | 本学科首页   官方微博 | 高级检索  
     检索      

最大互信息系数的并行计算方法研究
引用本文:朱道恒,李志强.最大互信息系数的并行计算方法研究[J].科学技术与工程,2021,21(34):14625-14633.
作者姓名:朱道恒  李志强
作者单位:广东海洋大学电子与信息工程学院,湛江524088;广东海洋大学电子与信息工程学院,湛江524088;南方海洋科学与工程广东省实验室,湛江524000
基金项目:国家自然科学基金(42176167,41676079),广东海洋大学创新强校工程项目(Q18307)
摘    要:针对最大互信息系数(Maximal Information Coefficient,MIC)近似算法在大规模数据场景下的计算时间复杂度高,计算时间增长快的问题,提出一种最大互信息系数并行计算(The Parallel Computing Maximal Information Coefficient,PCMIC)方法。分别在Spark和Spark-消息传递接口(Message Passing Interface,MPI)计算框架中,在不同的数据规模和不同的噪声水平下,利用PCMIC算法对十四种典型的相关关系做并行计算。另外在不同节点数的情况下,选择两种具有代表性的相关关系来测试PCMIC算法在两种计算框架中的性能。实验结果表明:(1)PCMIC算法在两种框架下的运算效果与原始MIC近似算法相比,同样具有普适性和均匀性,而且具有良好的可扩展性。(2)随着数据规模和节点数的增加,PCMIC算法在两种框架中运算的时间增长明显比MIC近似算法慢,而且在Spark-MPI框架下的并行加速比和效率略优于Spark。(3)Spark能够支持MPI任务的调度,为研究不同并行计算框架之间的融合奠定了一定的理论和应用基础。

关 键 词:最大互信息系数  并行计算  最大互信息系数并行计算(PCMIC)  Spark  消息传递接口(MPI)
收稿时间:2021/4/10 0:00:00
修稿时间:2021/10/2 0:00:00

Parallel calculation method for Maximum Information Coefficient
Zhu Daoheng,Li Zhiqiang.Parallel calculation method for Maximum Information Coefficient[J].Science Technology and Engineering,2021,21(34):14625-14633.
Authors:Zhu Daoheng  Li Zhiqiang
Institution:Guangdong Ocean University
Abstract:In order to address the high complexity of computational time and the fast growth of maximum information coefficient (MIC) approximation algorithm in the context of big data, a parallel computing maximum information coefficient algorithm is proposed in this study. A total of fourteen typical correlations were computed in parallel using the Parallel Computing Maximal Information Coefficient (PCMIC) algorithm at different data sizes and noise levels under Spark and Spark-Message Passing Interface (MPI) computing frameworks, respectively. In addition, two representative correlations were chosen to test the performance produced by the PCMIC algorithm under the two computing frameworks with different numbers of nodes. The analysis of the experimental results leads to the following conclusions. Firstly, the PCMIC algorithm is as pervasive and uniform in both frameworks as the original MIC approximation algorithm, with scalability demonstrated. Secondly, with the increase in data size and the number of nodes, the pace of time growth is significantly slower for the PCMIC algorithm in both frameworks than for the MIC approximation algorithm. Besides, Spark is slightly outperformed by the Spark-MPI framework in parallel speedup ratio and efficiency. Lastly, Spark is capable to support the scheduling of MPI tasks, thus laying a theoretical foundation for studying the convergence between different parallel computing frameworks and its application.
Keywords:MIC  Parallel Computing  Parallel Computing Maximal Information Coefficient  Spark  Message Passing Interface
本文献已被 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号