期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On the Imputation of Missing Data in Surveys with Likert-Type Scales

Maurizio Carpita Marica Manisera 《Journal of Classification》2011,28(1):93-112

Starting from the problem of missing data in surveys with Likert-type scales, the aim of this paper is to evaluate a possible improvement for the imputation procedure proposed by Lavori, Dawson, and Shera (1995) here called Approximate Bayesian bootstrap with Propensity score (ABP). We propose an imputation procedure named Approximate Bayesian bootstrap with Propensity score and Nearest neighbour (ABPN), which, after the ??propensity score step?? of ABP, randomly selects a donor in the nonrespondent??s neighbourhood, which includes cases with response patterns similar to the one of the nonrespondent to be imputed. A preliminary simulation study with single imputation on missing data in two Likerttype scales from a real data set shows that ABPN: (a) performed better than the ABP imputation, and (b) can be considered as a serious competitor of other procedures used in this context. 相似文献

2.

Accurate Tree-based Missing Data Imputation and Data Fusion within the Statistical Learning Paradigm

Antonio D’Ambrosio Massimo Aria Roberta Siciliano 《Journal of Classification》2012,29(2):227-258

Framework of this paper is statistical data editing, specifically how to edit or impute missing or contradictory data and how to merge two independent data sets presenting some lack of information. Assuming a missing at random mechanism, this paper provides an accurate tree-based methodology for both missing data imputation and data fusion that is justified within the Statistical Learning Theory of Vapnik. It considers both an incremental variable imputation method to improve computational efficiency as well as boosted trees to gain in prediction accuracy with respect to other methods. As a result, the best approximation of the structural risk (also known as irreducible error) is reached, thus reducing at minimum the generalization (or prediction) error of imputation. Moreover, it is distribution free, it holds independently of the underlying probability law generating missing data values. Performance analysis is discussed considering simulation case studies and real world applications. 相似文献

3.

Fitting a Mixture Model to Three-mode Three-way Data with Missing Information

Lynette A. Hunt Kaye E. Basford 《Journal of Classification》2001,18(2):209-226

相似文献

4.

Maximum Likelihood Estimation and Model Comparison for Mixtures of Structural Equation Models with Ignorable Missing Data

Sik-Yum Lee Xin-Yuan Song 《Journal of Classification》2003,20(2):221-255

The objective of this paper is to develop the maximum likelihood approach for analyzing a finite mixture of structural equation models with missing data that are missing at random. A Monte Carlo EM algorithm is proposed for obtaining the maximum likelihood estimates. A well-known statistic in model comparison, namely the Bayesian Information Criterion (BIC), is used for model comparison. With the presence of missing data, the computation of the observed-data likelihood function value involved in the BIC is not straightforward. A procedure based on path sampling is developed to compute this function value. It is shown by means of simulation studies that ignoring the incomplete data with missing entries gives less accurate ML estimates. An illustrative real example is also presented. 相似文献

5.

Incremental Classification with Generalized Eigenvalues 总被引：2，自引：0，他引：2

Claudio Cifarelli Mario R. Guarracino Onur Seref Salvatore Cuciniello Panos M. Pardalos 《Journal of Classification》2007,24(2):205-219

Supervised learning techniques are widely accepted methods to analyze data for scientific and real world problems. Most of these problems require fast and continuous acquisition of data, which are to be used in training the learning system. Therefore, maintaining such systems updated may become cumbersome. Various techniques have been devised in the field of machine learning to solve this problem. In this study, we propose an algorithm to reduce the training data to a substantially small subset of the original training data to train a generalized eigenvalue classifier. The proposed method provides a constructive way to understand the influence of new training data on an existing classification function. We show through numerical experiments that this technique prevents the overfitting problem of the earlier generalized eigenvalue classifiers, while promising a comparable performance in classification with respect to the state-of-the-art classification methods. 相似文献

6.

Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis

Julie Josse Marie Chavent Benot Liquet Fran?ois Husson 《Journal of Classification》2012,29(1):91-116

A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework. 相似文献

7.

Unfolding Incomplete Data: Guidelines for Unfolding Row-Conditional Rank Order Data with Random Missings

Frank M. T. A. Busing Mark de Rooij 《Journal of Classification》2009,26(3):329-360

Unfolding creates configurations from preference information. In this paper, it is argued that not all preference information needs to be collected and that good solutions are still obtained, even when more than half of the data is missing. Simulation studies are conducted to compare missing data treatments, sources of missing data, and designs for the specification of missing data. Guidelines are provided and used in actual practice. 相似文献

8.

Mixtures of Autoregressions with an Improper Component for Panel Data

Nicholas T. Longford Pierpaolo D’Urso 《Journal of Classification》2012,29(3):341-362

An EM algorithm for fitting mixtures of autoregressions of low order is constructed and the properties of the estimators are explored on simulated and real datasets. The mixture model incorporates a component with an improper density, which is intended for outliers. The model is proposed as an alternative to the search for the order of a single-component autoregression. The methods can be adapted to other patterns of dependence in panel data. An application to the monthly records of income of the outlets of a retail company is presented. 相似文献

9.

Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data 总被引：1，自引：0，他引：1

Javier Palarea-Albaladejo Josep Antoni Martín-Fernández Jesús A. Soto 《Journal of Classification》2012,29(2):144-169

Clustering techniques are based upon a dissimilarity or distance measure between objects and clusters. This paper focuses on the simplex space, whose elements??compositions??are subject to non-negativity and constant-sum constraints. Any data analysis involving compositions should fulfill two main principles: scale invariance and subcompositional coherence. Among fuzzy clustering methods, the FCM algorithm is broadly applied in a variety of fields, but it is not well-behaved when dealing with compositions. Here, the adequacy of different dissimilarities in the simplex, together with the behavior of the common log-ratio transformations, is discussed in the basis of compositional principles. As a result, a well-founded strategy for FCM clustering of compositions is suggested. Theoretical findings are accompanied by numerical evidence, and a detailed account of our proposal is provided. Finally, a case study is illustrated using a nutritional data set known in the clustering literature. 相似文献

10.

Fitting a Mixture Model to Three-Mode Three-Way Data with Categorical and Continuous Variables

Lynette A. Hunt Kaye E. Basford 《Journal of Classification》1999,16(2):283-296

相似文献