首页 | 本学科首页   官方微博 | 高级检索  
     检索      

海量数据广义线性模型变量选择算法研究
引用本文:陈少东,李志强.海量数据广义线性模型变量选择算法研究[J].北京化工大学学报(自然科学版),2020,47(2):130-136.
作者姓名:陈少东  李志强
作者单位:北京化工大学 数理学院, 北京 100029
摘    要:首先推导出了用于求解一般广义线性模型变量选择问题的非凸惩罚迭代估计算法,并利用分治思想对算法进行修正,使其能够适用于海量数据情形,以解决海量数据下进行变量选择时可能存在的内存溢出等问题。考虑到当前处理海量数据实际使用的工具,进一步给出了算法在分布式并行下的计算步骤,大幅提高了计算速度。在数值模拟中,通过单机和集群两种方式对算法进行数值计算,结果表明本文方法有效解决了数据存储问题且适用于分布式环境。最后,通过所提算法来完成Probit模型的变量选择,并将其用于新闻数据集的分类问题。

关 键 词:海量数据  广义线性模型  变量选择  分治算法  
收稿时间:2019-11-03

A variable selection algorithm for generalized linear models of massive data
CHEN ShaoDong,LI ZhiQiang.A variable selection algorithm for generalized linear models of massive data[J].Journal of Beijing University of Chemical Technology,2020,47(2):130-136.
Authors:CHEN ShaoDong  LI ZhiQiang
Institution:College of Mathematics and Science, Beijing University of Chemical Technology, Beijing 100029, China
Abstract:We derive a non-convex penalty iterative estimation algorithm for solving the generalized linear model variable selection problem, and then use the divide-and-conquer principle to modify the algorithm so that it can be applied to massive data problems and solve the problem of memory overflow bottlenecks which are common with massive data. Compared with the tools currently used to process massive amounts of data, our algorithm's computation steps utilize distributed parallelism, which greatly improves the calculation speed. In subsequent numerical simulations, the algorithm was numerically calculated in two ways:stand-alone and cluster. The results show that our method effectively solves the data storage problem and is suitable for distributed environments. Finally, this algorithm was used to complete the variable selection of the Probit model and used for classification of news datasets.
Keywords:massive data                                                                                                                        generalized linear model                                                                                                                        variable selection                                                                                                                        divide-and-conquer
点击此处可从《北京化工大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京化工大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号