首页 | 本学科首页   官方微博 | 高级检索  
     

不同类别非完整大数据中缺失数据填充算法研究
引用本文:王玮,苏琦,周伟,刘荫,张宾. 不同类别非完整大数据中缺失数据填充算法研究[J]. 科学技术与工程, 2018, 18(8)
作者姓名:王玮  苏琦  周伟  刘荫  张宾
作者单位:国网山东省电力公司 信息通信公司,国网山东省电力公司 信息通信公司,国网山东省电力公司 信息通信公司,国网山东省电力公司 信息通信公司;山东大学 计算机学院,国网山东省电力公司 信息通信公司
摘    要:针对目前缺失数据填充算法精度低、运行效率低、内存占用率大的特点,提出一种新的不同类别非完整大数据中缺失数据填充算法。通过2个定理阐述了缺失数据填充算法的原理,给出信息熵的计算过程。输入根据数据集构建的决策表和不同类别非完整大数据中缺失数据的最大值、最小值、填充步长。求出其他类指标和某类指标的相关性,得到数据集,求出权重系数;计算初始数据库的信息熵,通过相关理论或经验对缺失数据区间下限进行设定;用一个很小的区间数据取代缺失数据,根据给出的步长不断扩大区间范围,绘制出每一步信息熵状况,将其与初始数据库信息熵相比,实现缺失数据填充。实验结果表明,所提算法精度高、运行效率高、内存占用率低。

关 键 词:不同类别;非完整;大数据;缺失数据;填充
收稿时间:2017-08-07
修稿时间:2017-10-13

Research on missing data filling algorithm in different types of incomplete large data
Affiliation:State Grid Shandong electric power company information communication company,State Grid Shandong electric power company information communication company,State Grid Shandong electric power company information communication company,State Grid Shandong electric power company information communication company;Shandong University school of computing,State Grid Shandong electric power company information communication company
Abstract:Aiming?at?the?characteristics?of?the?missing?data?filling?algorithm,?such?as?low?precision,?low?efficiency?and?large?memory?occupation,?a?new?algorithm?for?missing?data?filling?in?different?types?of?incomplete?big?data?is?proposed.?The?principle?of?missing?data?filling?algorithm?is?explained?by?two?theorems,?and?the?calculation?process?of?information?entropy?is?given.?The?decision?table?constructed?according?to?the?data?set?and?the?maximum,?minimum,?and?fill?step?sizes?of?missing?data?in?different?categories?of?incomplete?big?data?are?entered.?The?correlation?of?the?indicators?is?solved?to?get?a?data?set,?and?calculate?the?weight?coefficient;?information?entropy?of?the?initial?database?is?to?calculate,?and?the?lower?limit?of?the?range?of?missing?data?is?set?through?the?related?theory?or?experience;?the?missing?data?is?replaced?with?a?small?interval?data.?According?to?the?given?step?size,?the?range?of?the?interval?is?extended,?and?the?entropy?status?of?each?step?is?drawn.?Compared?with?the?initial?database?entropy,?the?missing?data?filling?is?to?realize.?Experimental?results?show?that?the?proposed?algorithm?has?high?accuracy,?high?efficiency?and?low?memory?footprint.
Keywords:Different categories   incomplete   big data   missing data   filling
本文献已被 CNKI 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号