首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Accurate Tree-based Missing Data Imputation and Data Fusion within the Statistical Learning Paradigm
Authors:Antonio D’Ambrosio  Massimo Aria  Roberta Siciliano
Institution:1. Department of Mathematics and Statistics, University of Naples Federico II, Via Cinthia, M.te S. Angelo, 80126, Naples, Italy
Abstract:Framework of this paper is statistical data editing, specifically how to edit or impute missing or contradictory data and how to merge two independent data sets presenting some lack of information. Assuming a missing at random mechanism, this paper provides an accurate tree-based methodology for both missing data imputation and data fusion that is justified within the Statistical Learning Theory of Vapnik. It considers both an incremental variable imputation method to improve computational efficiency as well as boosted trees to gain in prediction accuracy with respect to other methods. As a result, the best approximation of the structural risk (also known as irreducible error) is reached, thus reducing at minimum the generalization (or prediction) error of imputation. Moreover, it is distribution free, it holds independently of the underlying probability law generating missing data values. Performance analysis is discussed considering simulation case studies and real world applications.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号