首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Dimensionally reduced model-based clustering methods are recently receiving a wide interest in statistics as a tool for performing simultaneously clustering and dimension reduction through one or more latent variables. Among these, Mixtures of Factor Analyzers assume that, within each component, the data are generated according to a factor model, thus reducing the number of parameters on which the covariance matrices depend. In Factor Mixture Analysis clustering is performed through the factors of an ordinary factor analysis which are jointly modelled by a Gaussian mixture. The two approaches differ in genesis, parameterization and consequently clustering performance. In this work we propose a model which extends and combines them. The proposed Mixtures of Factor Mixture Analyzers provide a unified class of dimensionally reduced mixture models which includes the previous ones as special cases and could offer a powerful tool for modelling non-Gaussian latent variables.  相似文献   

2.
The notion of defining a cluster as a component in a mixture model was put forth by Tiedeman in 1955; since then, the use of mixture models for clustering has grown into an important subfield of classification. Considering the volume of work within this field over the past decade, which seems equal to all of that which went before, a review of work to date is timely. First, the definition of a cluster is discussed and some historical context for model-based clustering is provided. Then, starting with Gaussian mixtures, the evolution of model-based clustering is traced, from the famous paper by Wolfe in 1965 to work that is currently available only in preprint form. This review ends with a look ahead to the next decade or so.  相似文献   

3.
This paper proposes a new way of overcoming the existing limitations. It generalizes the model used in the previous approaches by introducing a more comprehensive portfolio of covariance matrix structures. Further, this paper proposes a Bayesian solution in the presence of the noise in clustering problems. The performace of the proposed method is first studied by simulation; the procedure is also applied to the analysis of data concerning species of butterflies and diabetes patients.  相似文献   

4.
MCLUST is a software package for model-based clustering, density estimation and discriminant analysis interfaced to the S-PLUS commercial software and the R language. It implements parameterized Gaussian hierarchical clustering algorithms and the EM algorithm for parameterized Gaussian mixture models with the possible addition of a Poisson noise term. Also included are functions that combine hierarchical clustering, EM and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation, and discriminant analysis. MCLUST provides functionality for displaying and visualizing clustering and classification results. A web page with related links can be found at .  相似文献   

5.
6.
The rapid increase in the size of data sets makes clustering all the more important to capture and summarize the information, at the same time making clustering more difficult to accomplish. If model-based clustering is applied directly to a large data set, it can be too slow for practical application. A simple and common approach is to first cluster a random sample of moderate size, and then use the clustering model found in this way to classify the remainder of the objects. We show that, in its simplest form, this method may lead to unstable results. Our experiments suggest that a stable method with better performance can be obtained with two straightforward modifications to the simple sampling method: several tentative models are identified from the sample instead of just one, and several EM steps are used rather than just one E step to classify the full data set. We find that there are significant gains from increasing the size of the sample up to about 2,000, but not from further increases. These conclusions are based on the application of several alternative strategies to the segmentation of three different multispectral images, and to several simulated data sets.  相似文献   

7.
8.
This paper introduces a novel mixture model-based approach to the simultaneous clustering and optimal segmentation of functional data, which are curves presenting regime changes. The proposed model consists of a finite mixture of piecewise polynomial regression models. Each piecewise polynomial regression model is associated with a cluster, and within each cluster, each piecewise polynomial component is associated with a regime (i.e., a segment). We derive two approaches to learning the model parameters: the first is an estimation approach which maximizes the observed-data likelihood via a dedicated expectation-maximization (EM) algorithm, then yielding a fuzzy partition of the curves into K clusters obtained at convergence by maximizing the posterior cluster probabilities. The second is a classification approach and optimizes a specific classification likelihood criterion through a dedicated classification expectation-maximization (CEM) algorithm. The optimal curve segmentation is performed by using dynamic programming. In the classification approach, both the curve clustering and the optimal segmentation are performed simultaneously as the CEM learning proceeds. We show that the classification approach is a probabilistic version generalizing the deterministic K-means-like algorithm proposed in Hébrail, Hugueney, Lechevallier, and Rossi (2010). The proposed approach is evaluated using simulated curves and real-world curves. Comparisons with alternatives including regression mixture models and the K-means-like algorithm for piecewise regression demonstrate the effectiveness of the proposed approach.  相似文献   

9.
An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sparsity increases. This leads us to assert that very high dimensional data are of simple structure. We exemplify this finding through a range of simulated data cases. We discuss also application to very high frequency time series segmentation and modeling.  相似文献   

10.
A simple proof of the identification of a mixture of two univariate normal distributions is given. The proof is based on the equivalence of local identification with positive definiteness of the information matrix and the equivalence of the latter to a condition on the score vector that is easily checked for this model. Two extensions using the same line of proof are also given. We would like to thank Tom Wansbeek, Michel Wedel, Arie Kapteyn, and two anonymous reviewers for helpful comments on earlier versions of this paper.  相似文献   

11.
12.
MCLUST: Software for Model-Based Cluster Analysis   总被引:2,自引:2,他引:2  
  相似文献   

13.
What I call theoretical abduction (sentential and model-based)certainly illustrates much of what is important in abductive reasoning, especially the objective of selecting and creating a set of hypotheses that are able to dispense good (preferred) explanations of data, but fails to account for many cases of explanation occurring in science or in everyday reasoning when the exploitation of the environment is crucial. The concept of manipulative abduction is devoted to capture the role of action in many interesting situations: action provides otherwise unavailable information that enables the agent to solve problems by starting and performing a suitable abductive process of generation or selection of hypotheses. Many external things, usually inert from the epistemological point of view, can be transformed into what I callepistemic mediators, which are illustrated in the last part of the paper, together with an analysis of the related notions of ``perceptual and inceptual rehearsal' and of ``external representation'.  相似文献   

14.
As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering algorithms are based upon determining the best variable subspace according to model fitting in a stepwise manner. These techniques are often computationally intensive and can require extended periods of time to run; in fact, some are prohibitively computationally expensive for high-dimensional data. In this paper, a novel variable selection technique is introduced for use in clustering and classification analyses that is both intuitive and computationally efficient. We focus largely on applications in mixture model-based learning, but the technique could be adapted for use with various other clustering/classification methods. Our approach is illustrated on both simulated and real data, highlighted by contrasting its performance with that of other comparable variable selection techniques on the real data sets.  相似文献   

15.
Functional data sets appear in many areas of science. Although each data point may be seen as a large finite-dimensional vector it is preferable to think of them as functions, and many classical multivariate techniques have been generalized for this kind of data. A widely used technique for dealing with functional data is to choose a finite-dimensional basis and find the best projection of each curve onto this basis. Therefore, given a functional basis, an approach for doing curve clustering relies on applying the k-means methodology to the fitted basis coefficients corresponding to all the curves in the data set. Unfortunately, a serious drawback follows from the lack of robustness of k-means. Trimmed k-means clustering (Cuesta-Albertos, Gordaliza, and Matran 1997) provides a robust alternative to the use of k-means and, consequently, it may be successfully used in this functional framework. The proposed approach will be exemplified by considering cubic B-splines bases, but other bases can be applied analogously depending on the application at hand.  相似文献   

16.
We examine the problem of aggregating several partitions of a finite set into a single consensus partition We note that the dual concepts of clustering and isolation are especially significant in this connection. The hypothesis that a consensus partition should respect unanimity with respect to either concept leads us to stress a consensus interval rather than a single partition. The extremes of this interval are characterized axiomatically. If a sufficient totality of traits has been measured, and if measurement errors are independent, then a true classifying partition can be expected to lie in the consensus interval. The structure of the partitions in the interval lends itself to partial solutions of the consensus problem Conditional entropy may be used to quantify the uncertainty inherent in the interval as a whole  相似文献   

17.
Clustering criteria for discrete data and latent class models   总被引:1,自引:0,他引:1  
We show that a well-known clustering criterion for discrete data, the information criterion, is closely related to the classification maximum likelihood criterion for the latent class model. This relation can be derived from the Bryant-Windham construction. Emphasis is placed on binary clustering criteria which are analyzed under the maximum likelihood approach for different multivariate Bernoulli mixtures. This alternative form of criterion reveals non-apparent aspects of clustering techniques. All the criteria discussed can be optimized with the alternating optimization algorithm. Some illustrative applications are included.
Résumé Nous montrons que le critère de classification de l'information, souvent utilisé pour les données discrètes, est très lié au critère du maximum de vraisemblance classifiante appliqué au modèle des classes latentes. Ce lien peut être analysé sous l'approche de la paramétrisation de Bryant-Windham. L'accent est mis sur le cas des données binaires qui sont analysées sous l'approche du maximum de vraisemblance pour les mélanges de distributions multivariées de Bernoulli. Cette forme de critère permet de mettre en évidence des aspects cachés des méthodes de classification de données binaires. Tous les critères envisagés ici peuvent être optimisés avec l'algorithme d'optimisation alternée. Des exemples concluent cet article.
  相似文献   

18.
库恩的范式理论揭示了常规科学和科学革命交替进行的发展模式,但是他对科学检验在其中的作用过于轻视,原因在于他心目中只有假设-演绎检验模式,而没有贝叶斯检验模式.本文应用贝叶斯检验模型重新审视库恩的范式理论,展示其中的逻辑成分或理性成分,进而说明逻辑与信念、理性与非理性在科学革命过程中的张力结构.  相似文献   

19.
20.
Clustering of multivariate spatial-time series should consider: 1) the spatial nature of the objects to be clustered; 2) the characteristics of the feature space, namely the space of multivariate time trajectories; 3) the uncertainty associated to the assignment of a spatial unit to a given cluster on the basis of the above complex features. The last aspect is dealt with by using the Fuzzy C-Means objective function, based on appropriate measures of dissimilarity between time trajectories, by distinguishing the cross-sectional and longitudinal aspects of the trajectories. In order to take into account the spatial nature of the statistical units, a spatial penalization term is added to the above function, depending on a suitable spatial proximity/ contiguity matrix. A tuning coefficient takes care of the balance between, on one side, discriminating according to the pattern of the time trajectories and, on the other side, ensuring an approximate spatial homogeneity of the clusters. A technique for determining an optimal value of this coefficient is proposed, based on an appropriate spatial autocorrelation measure. Finally, the proposed models are applied to the classification of the Italian provinces, on the basis of the observed dynamics of some socio-economical indicators.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号