首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the framework of incomplete data analysis, this paper provides a nonparametric approach to missing data imputation based on Information Retrieval. In particular, an incremental procedure based on the iterative use of tree-based method is proposed and a suitable Incremental Imputation Algorithm is introduced. The key idea is to define a lexicographic ordering of cases and variables so that conditional mean imputation via binary trees can be performed incrementally. A simulation study and real data applications are carried out to describe the advantages and the performance with respect to standard approaches.  相似文献   

2.
Framework of this paper is statistical data editing, specifically how to edit or impute missing or contradictory data and how to merge two independent data sets presenting some lack of information. Assuming a missing at random mechanism, this paper provides an accurate tree-based methodology for both missing data imputation and data fusion that is justified within the Statistical Learning Theory of Vapnik. It considers both an incremental variable imputation method to improve computational efficiency as well as boosted trees to gain in prediction accuracy with respect to other methods. As a result, the best approximation of the structural risk (also known as irreducible error) is reached, thus reducing at minimum the generalization (or prediction) error of imputation. Moreover, it is distribution free, it holds independently of the underlying probability law generating missing data values. Performance analysis is discussed considering simulation case studies and real world applications.  相似文献   

3.
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework.  相似文献   

4.
The objective of this paper is to develop the maximum likelihood approach for analyzing a finite mixture of structural equation models with missing data that are missing at random. A Monte Carlo EM algorithm is proposed for obtaining the maximum likelihood estimates. A well-known statistic in model comparison, namely the Bayesian Information Criterion (BIC), is used for model comparison. With the presence of missing data, the computation of the observed-data likelihood function value involved in the BIC is not straightforward. A procedure based on path sampling is developed to compute this function value. It is shown by means of simulation studies that ignoring the incomplete data with missing entries gives less accurate ML estimates. An illustrative real example is also presented.  相似文献   

5.
In Between Us: On the Transparency and Opacity of Technological Mediation   总被引:1,自引:1,他引:0  
In recent years several approaches??philosophical, sociological, psychological??have been developed to come to grips with our profoundly technologically mediated world. However, notwithstanding the vast merit of each, they illuminate only certain aspects of technological mediation. This paper is a preliminary attempt at a philosophical reflection on technological mediation as such??deploying the concepts of ??transparency?? and ??opacity?? as heuristic instruments. Hence, we locate a ??theory of transparency?? within several theoretical frameworks??respectively classic phenomenology, media theory, Actor Network Theory, postphenomenology, several ethnographical, psychological, and sociological perspectives, and finally, the ??Critical Theory of Technology.?? Subsequently, we render a general, systematic overview of these theories, thereby conjecturing what a broad analysis of technological mediation in and of itself might look like??finding, at last, an essential contradiction between transparency of ??use?? and transparency of social origins and effects.  相似文献   

6.
Performing Phenomenology: Negotiating Presence in Intermedial Theatre   总被引:3,自引:3,他引:0  
This paper analyzes from a pragmatic postphenomenological point of view the performative practice of CREW, a multi-disciplinary team of artists and researchers. It is our argument that this company, in its use of new immersive technologies in the context of a live stage, gives rise to a dialectics between an embodied and a disembodied perspective towards the perceived world. We will focus on W (Double U), a collaborative interactive performance, where immersive technology is used for live exchange of vision. By means of a head mounted omni-directional camera and display the fields of vision of two participants are swapped, which enables the participants to perceive the world through another person??s point of view. This intermedial experience brings a classic dichotomic perception of space to falter: material reality as a ??live?? condition can no longer be opposed to a virtual mediated reality. In the shifting moment between the embodied and the perceived world, on the fracture between what one sees and what one feels, the distinction between live and mediated is blurred, moreover, can no longer be made. The perception of the body is pushed to the extreme, causing a most confusing corporal awareness, a condition that intensifies the experience and causes an altered sense of presence. In a dynamic cognitive negotiation, one tends, however, to unify the divergent ontologies of the ??real?? and the ??virtual?? to a meaningful experience. In this respect, we refer to recent neurological experiments such as the ??rubber hand illusion?? in order to clarify the spectator??s tendency to fuse both ontologies and to embody a coherent image-world.  相似文献   

7.
Multiple imputation is one of the most highly recommended procedures for dealing with missing data. However, to date little attention has been paid to methods for combining the results from principal component analyses applied to a multiply imputed data set. In this paper we propose Generalized Procrustes analysis for this purpose, of which its centroid solution can be used as a final estimate for the component loadings. Convex hulls based on the loadings of the imputed data sets can be used to represent the uncertainty due to the missing data. In two simulation studies, the performance of Generalized Procrustes approach is evaluated and compared with other methods. More specifically it is studied how these methods behave when order changes of components and sign reversals of component loadings occur, such as in case of near-equal eigenvalues, or data having almost as many counterindicative items as indicative items. The simulations show that other proposed methods either may run into serious problems or are not able to adequately assess the accuracy due to the presence of missing data. However, when the above situations do not occur, all methods will provide adequate estimates for the PCA loadings.  相似文献   

8.
It is shown that if cell weights may be calculated from the data the chance-corrected Zegers-ten Berge coefficients for metric scales are special cases of Cohen’s weighted kappa. The corrected coefficients include Pearson’s product-moment correlation, Spearman’s rank correlation and the intraclass correlation ICC(3, 1).  相似文献   

9.
This paper presents a Bayesian model based clustering approach for dichotomous item responses that deals with issues often encountered in model based clustering like missing data, large data sets and within cluster dependencies. The approach proposed will be illustrated using an example concerning Brand Strategy Research.  相似文献   

10.
A k-dissimilarity D on a finite set X, |X|????k, is a map from the set of size k subsets of X to the real numbers. Such maps naturally arise from edgeweighted trees T with leaf-set X: Given a subset Y of X of size k, D(Y ) is defined to be the total length of the smallest subtree of T with leaf-set Y . In case k?=?2, it is well-known that 2-dissimilarities arising in this way can be characterized by the so-called ??4-point condition??. However, in case k?>?2 Pachter and Speyer (2004) recently posed the following question: Given an arbitrary k-dissimilarity, how do we test whether this map comes from a tree? In this paper, we provide an answer to this question, showing that for k????3 a k-dissimilarity on a set X arises from a tree if and only if its restriction to every 2?k-element subset of X arises from some tree, and that 2?k is the least possible subset size to ensure that this is the case. As a corollary, we show that there exists a polynomial-time algorithm to determine when a k-dissimilarity arises from a tree. We also give a 6-point condition for determining when a 3-dissimilarity arises from a tree, that is similar to the aforementioned 4-point condition.  相似文献   

11.
The additive biclustering model for two-way two-mode object by variable data implies overlapping clusterings of both the objects and the variables together with a weight for each bicluster (i.e., a pair of an object and a variable cluster). In the data analysis, an additive biclustering model is fitted to given data by means of minimizing a least squares loss function. To this end, two alternating least squares algorithms (ALS) may be used: (1) PENCLUS, and (2) Baier’s ALS approach. However, both algorithms suffer from some inherent limitations, which may hamper their performance. As a way out, based on theoretical results regarding optimally designing ALS algorithms, in this paper a new ALS algorithm will be presented. In a simulation study this algorithm will be shown to outperform the existing ALS approaches.  相似文献   

12.
In compositional data analysis, an observation is a vector containing nonnegative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we investigate methods for classification of compositional data. Our approach centers on the idea of using the α-transformation to transform the data and then to classify the transformed data via regularized discriminant analysis and the k-nearest neighbors algorithm. Using the α-transformation generalizes two rival approaches in compositional data analysis, one (when α=1) that treats the data as though they were Euclidean, ignoring the compositional constraint, and another (when α = 0) that employs Aitchison’s centered log-ratio transformation. A numerical study with several real datasets shows that whether using α = 1 or α = 0 gives better classification performance depends on the dataset, and moreover that using an intermediate value of α can sometimes give better performance than using either 1 or 0.  相似文献   

13.
This paper proposes a new way of overcoming the existing limitations. It generalizes the model used in the previous approaches by introducing a more comprehensive portfolio of covariance matrix structures. Further, this paper proposes a Bayesian solution in the presence of the noise in clustering problems. The performace of the proposed method is first studied by simulation; the procedure is also applied to the analysis of data concerning species of butterflies and diabetes patients.  相似文献   

14.
We describe a simple time series transformation to detect differences in series that can be accurately modelled as stationary autoregressive (AR) processes. The transformation involves forming the histogram of above and below the mean run lengths. The run length (RL) transformation has the benefits of being very fast, compact and updatable for new data in constant time. Furthermore, it can be generated directly from data that has already been highly compressed. We first establish the theoretical asymptotic relationship between run length distributions and AR models through consideration of the zero crossing probability and the distribution of runs. We benchmark our transformation against two alternatives: the truncated Autocorrelation function (ACF) transform and the AR transformation, which involves the standard method of fitting the partial autocorrelation coefficients with the Durbin-Levinson recursions and using the Akaike Information Criterion stopping procedure. Whilst optimal in the idealized scenario, representing the data in these ways is time consuming and the representation cannot be updated online for new data. We show that for classification problems the accuracy obtained through using the run length distribution tends towards that obtained from using the full fitted models. We then propose three alternative distance measures for run length distributions based on Gower’s general similarity coefficient, the likelihood ratio and dynamic time warping (DTW). Through simulated classification experiments we show that a nearest neighbour distance based on DTW converges to the optimal faster than classifiers based on Euclidean distance, Gower’s coefficient and the likelihood ratio. We experiment with a variety of classifiers and demonstrate that although the RL transform requires more data than the best performing classifier to achieve the same accuracy as AR or ACF, this factor is at worst non-increasing with the series length, m, whereas the relative time taken to fit AR and ACF increases with m. We conclude that if the data is stationary and can be suitably modelled by an AR series, and if time is an important factor in reaching a discriminatory decision, then the run length distribution transform is a simple and effective transformation to use.  相似文献   

15.
In this paper we present a way of conducting design of experiments by Multivariate Additive Partial Least-Squares Splines models, in short MAPLSS. In the framework of optimal experimental design based on small samples, in order to select the most informative MAPLSS model, we process an adaptive incremental selection of observations by a particular bootstrap procedure. Why MAPLSS models? Because they inherit the advantages of the PLS regression that permits to capture additively non-linear main effects and relevant interactions in the difficult framework of small samples. The effectiveness of this approach is illustrated on the reservoir simulator data used to forecast oil production.  相似文献   

16.
Without the support of imagination, one would not have the slightest idea of the cruel ‘real’ that has occurred in the Nazi extermination camps. Yet, in documentaries imaging the events of the Shoah, one runs the risk of missing their most basic property, namely their unimaginability. The mere idea that one is able to imagine the unimaginable comes down to a denial of the Shoah’s status as an event that defies our understanding. The unimaginable ‘real’ of the Shoah, however, is not simply located in its object, in the cruelty of what happened in the camp. The Shoah makes us at the same time facing the unimaginable ‘real’ of the modern subject—the blind spot in our own identity. If we need imagination to deal with the Shoah, it is also because of an ungraspable ‘real’ in ourselves. This is why adequate Shoah representations, acknowledging their object as being beyond representation, include the same ‘beyond’ concerning the subject of the Holocaust memory. The essay makes this clear in an elaborated comparison of Claude Lanzmann’s 1985 film, Shoah, with some conceptual works of art from the late nineties—all of this ‘fine-tuned’ in a reflection upon Ingmar Bergman’s Persona.  相似文献   

17.
In this paper we will offer a few examples to illustrate the orientation of contemporary research in data analysis and we will investigate the corresponding role of mathematics. We argue that the modus operandi of data analysis is implicitly based on the belief that if we have collected enough and sufficiently diverse data, we will be able to answer most relevant questions concerning the phenomenon itself. This is a methodological paradigm strongly related, but not limited to, biology, and we label it the microarray paradigm. In this new framework, mathematics provides powerful techniques and general ideas which generate new computational tools. But it is missing any explicit isomorphism between a mathematical structure and the phenomenon under consideration. This methodology used in data analysis suggests the possibility of forecasting and analyzing without a structured and general understanding. This is the perspective we propose to call agnostic science, and we argue that, rather than diminishing or flattening the role of mathematics in science, the lack of isomorphisms with phenomena liberates mathematics, paradoxically making more likely the practical use of some of its most sophisticated ideas.  相似文献   

18.
19.
Block-Relaxation Approaches for Fitting the INDCLUS Model   总被引:1,自引:1,他引:0  
A well-known clustering model to represent I?×?I?×?J data blocks, the J frontal slices of which consist of I?×?I object by object similarity matrices, is the INDCLUS model. This model implies a grouping of the I objects into a prespecified number of overlapping clusters, with each cluster having a slice-specific positive weight. An INDCLUS model is fitted to a given data set by means of minimizing a least squares loss function. The minimization of this loss function has appeared to be a difficult problem for which several algorithmic strategies have been proposed. At present, the best available option seems to be the SYMPRES algorithm, which minimizes the loss function by means of a block-relaxation algorithm. Yet, SYMPRES is conjectured to suffer from a severe local optima problem. As a way out, based on theoretical results with respect to optimally designing block-relaxation algorithms, five alternative block-relaxation algorithms are proposed. In a simulation study it appears that the alternative algorithms with overlapping parameter subsets perform best and clearly outperform SYMPRES in terms of optimization performance and cluster recovery.  相似文献   

20.
An algorithm to maximize the agreement between partitions   总被引:2,自引:1,他引:1  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号