首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于二次组合的特征工程与XGBoost模型的用户行为预测
引用本文:杨立洪,白肇强.基于二次组合的特征工程与XGBoost模型的用户行为预测[J].科学技术与工程,2018,18(14).
作者姓名:杨立洪  白肇强
作者单位:华南理工大学数学学院
基金项目:广东省产学研协同创新成果转化项目(2016B090918041)和 广州市产学研协同创新重大专项(201504302222568)
摘    要:特征构造的难题在数据挖掘过程中一直存在,传统固化的特征工程对于业务场景千变万化的数据挖掘任务所带来的效益十分有限,因此解决特征工程的特征构造问题已经成为数据挖掘的瓶颈之一;尤其在机器学习算法快速发展的情况下,特征逐渐成为模型中急需重视的部分。基于电商平台的用户行为数据,在原有特征群的基础上提出了二次组合统计特征的构建方法。利用二次交叉衍生出丰富而又切合业务场景的特征群,同时结合两种滑动窗口的方法,分别是定长滑动窗口获取更多的训练样本,变长滑动窗口获取具有时间权重的训练特征,以此来最大限度地还原出用户真实的行为习惯。最后,使用不同的特征组合结合降维的方法建立对照检验模型;并利用线性的逻辑回归模型、线性支持向量机以及树模型极端随机森林与XGBoost对模型进行交叉验证。结果表明,组合特征在树模型的算法中得到了非常好的表达效果;而且无论在线性模型还是树模型中衍生特征群模型的F1值都优于基础特征群。

关 键 词:特征工程  二次组合特征  用户行为预测  XGBoost模型
收稿时间:2017/11/8 0:00:00
修稿时间:2018/1/24 0:00:00

User Behavior prediction based on combinatorial statistical features and XGBoost
Yang Li-hong and.User Behavior prediction based on combinatorial statistical features and XGBoost[J].Science Technology and Engineering,2018,18(14).
Authors:Yang Li-hong and
Institution:Department of Mathematics,South China University of Technology,
Abstract:Constructing feature has always been a problem in the process of data mining when conventional ways for feature engineering do not satisfy the need of various data mining mission any more. As machine learning is in a state of rapid development, feature engineering has been playing an important role gradually. This paper use the data of user behavior to construct statistical combination feature based on the original feature, which is particularly suitable for the business scene. At the same time we will use two different window sliding method, in other words, fixed length window sliding to obtain more training samples, and variable length window sliding to get more feature from different time dimension, for the purpose of reproducing the real habit of user in daily life as much as possible. In the end of this paper, different combinations of features will be used for control experiment, while different models such as LR, SVM, ET and XGBoost are all used for experiment as well. The results show that no matter in the linear model or tree model, the F1 value of the combination feature group is better than the original feature group.
Keywords:feature engineering  feature combination  user behavior prediction  xgboost
本文献已被 CNKI 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号