首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于G4ICCS系统的数据挖掘并行算法
引用本文:刘威,路来君,王洪肖,曹延波.基于G4ICCS系统的数据挖掘并行算法[J].吉林大学学报(信息科学版),2013,31(3):324-327.
作者姓名:刘威  路来君  王洪肖  曹延波
作者单位:吉林大学综合信息矿产预测研究所,长春130026;吉林大学公共计算机教学与研究中心,长春130012;吉林大学地球科学学院,长春,130026;吉林大学公共计算机教学与研究中心,长春,130012
基金项目:吉林省“十二五”矿产资源规划预测基金资助项目(3R212H104422)
摘    要:针对传统决策树SPRINT(Scalable Parallelizable Induction of Decision Trees)算法不能处理海量地学数据挖掘的问题, 设计实现了基于G4ICCS(Geology Geography Geochemistry Geophysics Information Cloud Computing System)的决策树并行分类算法PSPRINT。该算法使用哈希表存储连续属性分割点两侧的数据记录, 为并行节点的分割提供依据, 在MapReduce架构下解决了海量地学数据挖掘问题。实验结果表明, 在模拟的云计算环境下, 决策树并行算法可以处理海量地学数据分类问题, 并获得较好的稳定性和较高的处理速度。

关 键 词:地学G4ICCS系统  数据挖掘  决策树算法  并行
收稿时间:2013-02-28

Data Mining Parallel Algorithm Based on G4ICCS
LIU Weia,b,LU Lai-junc,WANG Hong-xiaob,CAO Yan-bo.Data Mining Parallel Algorithm Based on G4ICCS[J].Journal of Jilin University:Information Sci Ed,2013,31(3):324-327.
Authors:LIU Weia  b  LU Lai-junc  WANG Hong-xiaob  CAO Yan-bo
Institution:a. Mineral Prediction Institute, Jilin University, Changchun 130026, China; b. Center for Computer Fundamental Education, Jilin University,Changchun 130012, China; c. College of Earth Sciences, Jilin University, Changchun 130026, China
Abstract:For the traditional decision tree SPRINT(Scalable Parallelizable Induction of Decision Trees) algorithm cannot solve the problem of mass geoscience data mining, the paper designed and realized PSPRINT algorithm. It is a decision tree parallel classification algorithm based on G4ICCS (Geology Geography Geochemistry Geophysics Information Cloud Computing System). The algorithm uses hash table to save data record on both sides of continuous attributes po
intof division, providing basis for the division of parallel node, and solved mass geoscience data mining problem. The experimental results show that the decision t
ree parallel algorithm can deal with the classification problem of mass geoscience data under the simulated environment of cloud computing. And the algorithm has better stability and processing speed.
Keywords:geology geography geochemistry geophysics information cloud computing system(G4ICCS)  data mining  decision tree algorithm  parallel  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《吉林大学学报(信息科学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(信息科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号