首页 | 本学科首页   官方微博 | 高级检索  
     检索      

PDM:基于Hadoop的并行数据分析系统
引用本文:段松青,吴斌,于乐,王柏.PDM:基于Hadoop的并行数据分析系统[J].湖南大学学报(自然科学版),2012,39(10):87-92.
作者姓名:段松青  吴斌  于乐  王柏
作者单位:北京邮电大学计算机学院
基金项目:国家自然科学基金资助项目(90924029,60905025,61074128)
摘    要:提出了一款基于Hadoop的并行数据分析系统——PDM.该系统拥有大量以MapReduce为计算框架的并行数据分析算法,不仅包括传统的ETL、数据挖掘、数据统计和文本分析算法,还引入了基于图理论的SNA(社会网络分析)算法.详细阐述了并行多元线性回归算法和"多源最短路径"算法的原理和实现,其中,提出的"消息传递模型"能有效解决MapReduce难以处理邻接矩阵的问题;介绍了基于电信数据的典型应用,如采用并行k均值和决策树算法实现的"套餐推荐",利用并行PageRank算法实现的"营销关键点发现"等;最后通过性能测试,说明该系统适合高效地处理大规模数据.

关 键 词:云计算  Hadoop  并行算法  数据挖掘  社会网络分析

PDM:A Parallel Data Analysis System Based on Hadoop
DUAN Song-qing,WU Bin,YU Le,WANG Bai.PDM:A Parallel Data Analysis System Based on Hadoop[J].Journal of Hunan University(Naturnal Science),2012,39(10):87-92.
Authors:DUAN Song-qing  WU Bin  YU Le  WANG Bai
Institution:(School of Computer Science,Beijing Univ of Posts and Telecommunications,Beijing 100876,China)
Abstract:A PDM(Parallel Data Mining) system was built based on Hadoop.PDM contains a large number of parallel data analysis algorithms based on MapReduce computational framework.These algorithms not only contain the classic algorithms of ETL,data mining,data statistical and text analysis,but also introduce SNA(social network analysis) based on graph mining.The principle and implementation of the parallel multiple linear regression algorithm and the multi-source shortest path algorithm were described and the " Message-passing model " proposed can effectively solve the problem that MapReduce is difficult to deal with the adjacency matrix structure.This paper also illustrates some typical applications of telecommunications,such as the " Business recommendation " based on parallel k-means and decision tree algorithms,the " Marketing key points discovery " based on parallel PageRank algorithm and the like.Finally,the results of performance test show that the proposed system is suitable for dealing with large scale data efficiently.
Keywords:cloud computing  Hadoop  parallel algorithms  data mining  social network analysis
本文献已被 CNKI 等数据库收录!
点击此处可从《湖南大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《湖南大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号