首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种可扩展的遗失值填充算法
引用本文:周继承,周宏广.一种可扩展的遗失值填充算法[J].中南大学学报(自然科学版),2004,35(5):825-829.
作者姓名:周继承  周宏广
作者单位:中南大学,信息科学与工程学院,湖南,长沙,410083
摘    要:针对数据预处理中的遗失值填充问题,运用策略模式设计了一种可扩展的遗失值填充算法;构造了SimpleImputation,KNNImputation和DTBImputation 3个具体的策略类,分别封装了简单遗失值填充算法、KNN遗失值填充算法以及DTB遗失值填充算法.实验结果表明:简单填充算法执行速度最快但精度最低,DTB算法执行速度较慢但精度较高,KNN算法执行速度最慢但精度最高.该算法允许用户根据自身对速度和精度的需求来选取相应的填充算法,并通过添加新策略类的方式来扩展其遗失值填充功能,从而解决了遗失值造成的数据质量问题,提高了数据预处理程序的通用性和可扩展性.

关 键 词:遗失值  策略模式  数据预处理
文章编号:1672-7207(2004)05-0825-05
修稿时间:2004年3月28日

An Extensible Algorithm for Filling Missing Data Values
ZHOU Ji-cheng,ZHOU Hong-guang.An Extensible Algorithm for Filling Missing Data Values[J].Journal of Central South University:Science and Technology,2004,35(5):825-829.
Authors:ZHOU Ji-cheng  ZHOU Hong-guang
Abstract:For filling missing data values in the data pre-processing, an extensible algorithm based on strategy pattern is put forward. In the algorithm three concrete strategy classes is used for encapsulate the simple-filling , KNN-filling and DTB-filling algorithm for dealing with missing data values. The experiment results of filling missing data show that simple-filling algorithm has the fastest speed and the lowest precision, DTB-filling algorithm has slower speed and higher precision,KNN-filling algorithm has the slowest speed and the highest precision. Allowing users to choose the proper algorithm according to their own requirements, such as time or precision, and extend the function of filling missing data values through adding new strategy classes, the extensible algorithm does not only solve the problem of the data quality caused by missing data values, but also enhance the extensible and general capability of the data pre-processing application.
Keywords:missing data values  strategy pattern  data pre-processing
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号