Accurate Tree-based Missing Data Imputation and Data Fusion within the Statistical Learning Paradigm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Accurate Tree-based Missing Data Imputation and Data Fusion within the Statistical Learning Paradigm

Authors:	Antonio D’Ambrosio Massimo Aria Roberta Siciliano

Institution:	1. Department of Mathematics and Statistics, University of Naples Federico II, Via Cinthia, M.te S. Angelo, 80126, Naples, Italy

Abstract:	Framework of this paper is statistical data editing, specifically how to edit or impute missing or contradictory data and how to merge two independent data sets presenting some lack of information. Assuming a missing at random mechanism, this paper provides an accurate tree-based methodology for both missing data imputation and data fusion that is justified within the Statistical Learning Theory of Vapnik. It considers both an incremental variable imputation method to improve computational efficiency as well as boosted trees to gain in prediction accuracy with respect to other methods. As a result, the best approximation of the structural risk (also known as irreducible error) is reached, thus reducing at minimum the generalization (or prediction) error of imputation. Moreover, it is distribution free, it holds independently of the underlying probability law generating missing data values. Performance analysis is discussed considering simulation case studies and real world applications.

Keywords:
本文献已被 SpringerLink 等数据库收录！