首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于决策树-逻辑回归模型精确识别僵尸企业
引用本文:吴东鹏,王峥,童薇,叶枫,宋楚翘.基于决策树-逻辑回归模型精确识别僵尸企业[J].应用科学学报,2021,39(4):569-580.
作者姓名:吴东鹏  王峥  童薇  叶枫  宋楚翘
作者单位:1. 河海大学 计算机与信息学院, 江苏 南京 211100;2. 河海大学 商学院, 江苏 南京 211100
基金项目:中央高校基本科研业务费项目基金(No.B200202185);江苏省“六大人才高峰”项目基金(No.XYDXX-078);江苏省高等学校自然科学基金(No.19KJB630006)资助
摘    要:针对如何精准识别僵尸企业的问题,借助湖南科创信息有限公司公开的企业信息数据集,提出了一种决策树-逻辑回归的僵尸企业识别方法。该方法用中位数填充缺失数和离群值,然后分析数据集并进行特征衍生,最后使用多元线性回归和卡方检验等方法完成特征筛选。为了验证所提出方法的有效性,分别在阿里云环境和本地环境下将该方法与过度借贷法、连续亏损法、随机森林算法、BP神经网络算法、XGBoost算法进行比较。每个模型均训练50次,每次训练按一定比例随机选取数据,最终取各个指标的平均值作为最终实验结果。实验结果表明:所提出的决策树-逻辑回归模型对于僵尸企业的识别准确率最高,达到99.98%;并且模型的运行速度相对各种集成模型的速度有较大优势,平均执行时间约为1.5 s。在各实验环境中,实验结果差异较小,验证了该模型的有效性和稳定性。

关 键 词:僵尸企业  机器学习  特征工程  决策树-逻辑回归  
收稿时间:2020-08-26

Accurately Identify Zombie Enterprises Based on Decision Tree-Logistic Regression Model
WU Dongpeng,WANG Zheng,TONG Wei,YE Feng,SONG Chuqiao.Accurately Identify Zombie Enterprises Based on Decision Tree-Logistic Regression Model[J].Journal of Applied Sciences,2021,39(4):569-580.
Authors:WU Dongpeng  WANG Zheng  TONG Wei  YE Feng  SONG Chuqiao
Institution:1. School of Computer and Information, Hohai University, Nanjing 211100, Jiangsu, China;2. School of Business, Hohai University, Nanjing 211100, Jiangsu, China
Abstract:Aiming at the problem of how to accurately identify zombie enterprises, based on the enterprise information data set published by Hunan Kechuang Information Co., LTD., a zombie enterprise identification method based on decision tree-logistic regression model is proposed. The method uses median to fill in missing numbers and outliers, analyzes data sets for feature derivation, and finally uses multiple linear regression and chi-square test to complete feature screening. In order to verify the effectiveness of the proposed method, comparative experiments are carried out between the method and the over-borrowing method, continuous loss method, random forest algorithm, BP neural network algorithm, and XGBoost algorithm in the Alibaba Cloud environment and the local environment. Each model is trained 50 times, the data selected for each training is randomly selected according to a certain proportion, and finally the average value of each index is taken as the final result. Experimental results show that the proposed decision tree-logistic regression model has the highest accuracy in the identification of zombie companies, reaching 99.98%, and the model is superior to various other integrated models in running speed with average execution time of about 1.5 s. In all scenarios, experimental results of this model show relatively small differences, verifying the effectiveness and stability of the model.
Keywords:zombie enterprise  machine learning  feature engineering  decision tree-logistic regression  
本文献已被 CNKI 等数据库收录!
点击此处可从《应用科学学报》浏览原始摘要信息
点击此处可从《应用科学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号