首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A latent class vector model for preference ratings   总被引:1,自引:1,他引:1  
A latent class formulation of the well-known vector model for preference data is presented. Assuming preference ratings as input data, the model simultaneously clusters the subjects into a small number of homogeneous groups (or latent classes) and constructs a joint geometric representation of the choice objects and the latent classes according to a vector model. The distributional assumptions on which the latent class approach is based are analogous to the distributional assumptions that are consistent with the common practice of fitting the vector model to preference data by least squares methods. An EM algorithm for fitting the latent class vector model is described as well as a procedure for selecting the appropriate number of classes and the appropriate number of dimensions. Some illustrative applications of the latent class vector model are presented and some possible extensions are discussed. Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijk Onderzoek.”  相似文献   

2.
A probabilistic DEDICOM model was proposed for mobility tables. The model attempts to explain observed transition probabilities by a latent mobility table and a set of transition probabilities from latent classes to observed classes. The model captures asymmetry in observed mobility tables by asymmetric latent mobility tables. It may be viewed as a special case of both the latent class model and DEDICOM with special constraints. A maximum penalized likelihood (MPL) method was developed for parameter estimation. The EM algorithm was adapted for the MPL estimation. Two examples were given to illustrate the proposed method. The work reported in this paper has been supported by grant A6394 to the first author from the Natural Sciences and Engineering Research Council of Canada and by a fellowship of the Royal Netherlands Academy of Arts and Sciences to the second author. We would like to thank anonymous reviewers for their insightful comments.  相似文献   

3.
A latent class probit model for analyzing pick any/N data   总被引:3,自引:3,他引:0  
A latent class probit model is developed in which it is assumed that the binary data of a particular subject follow a finite mixture of multivariate Bermoulli distributions. An EM algorithm for fitting the model is described and a Monte Carlo procedure for testing the number of latent classes that is required for adequately describing the data is discussed. In the final section, an application of the latent class probit model to some intended purchase data for residential telecommunication devices is reported. Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijk Onderzoek.”  相似文献   

4.
Probabilistic feature models (PFMs) can be used to explain binary rater judgements about the associations between two types of elements (e.g., objects and attributes) on the basis of binary latent features. In particular, to explain observed object-attribute associations PFMs assume that respondents classify both objects and attributes with respect to a, usually small, number of binary latent features, and that the observed object-attribute association is derived as a specific mapping of these classifications. Standard PFMs assume that the object-attribute association probability is the same according to all respondents, and that all observations are statistically independent. As both assumptions may be unrealistic, a multilevel latent class extension of PFMs is proposed which allows objects and/or attribute parameters to be different across latent rater classes, and which allows to model dependencies between associations with a common object (attribute) by assuming that the link between features and objects (attributes) is fixed across judgements. Formal relationships with existing multilevel latent class models for binary three-way data are described. As an illustration, the models are used to study rater differences in product perception and to investigate individual differences in the situational determinants of anger-related behavior.  相似文献   

5.
A trend in educational testing is to go beyond unidimensional scoring and provide a more complete profile of skills that have been mastered and those that have not. To achieve this, cognitive diagnosis models have been developed that can be viewed as restricted latent class models. Diagnosis of class membership is the statistical objective of these models. As an alternative to latent class modeling, a nonparametric procedure is introduced that only requires specification of an item-by-attribute association matrix, and classifies according to minimizing a distance measure between observed responses, and the ideal response for a given attribute profile that would be implied by the item-by-attribute association matrix. This procedure requires no statistical parameter estimation, and can be used on a sample size as small as 1. Heuristic arguments are given for why the nonparametric procedure should be effective under various possible cognitive diagnosis models for data generation. Simulation studies compare classification rates with parametric models, and consider a variety of distance measures, data generation models, and the effects of model misspecification. A real data example is provided with an analysis of agreement between the nonparametric method and parametric approaches.  相似文献   

6.
This paper introduces a novel mixture model-based approach to the simultaneous clustering and optimal segmentation of functional data, which are curves presenting regime changes. The proposed model consists of a finite mixture of piecewise polynomial regression models. Each piecewise polynomial regression model is associated with a cluster, and within each cluster, each piecewise polynomial component is associated with a regime (i.e., a segment). We derive two approaches to learning the model parameters: the first is an estimation approach which maximizes the observed-data likelihood via a dedicated expectation-maximization (EM) algorithm, then yielding a fuzzy partition of the curves into K clusters obtained at convergence by maximizing the posterior cluster probabilities. The second is a classification approach and optimizes a specific classification likelihood criterion through a dedicated classification expectation-maximization (CEM) algorithm. The optimal curve segmentation is performed by using dynamic programming. In the classification approach, both the curve clustering and the optimal segmentation are performed simultaneously as the CEM learning proceeds. We show that the classification approach is a probabilistic version generalizing the deterministic K-means-like algorithm proposed in Hébrail, Hugueney, Lechevallier, and Rossi (2010). The proposed approach is evaluated using simulated curves and real-world curves. Comparisons with alternatives including regression mixture models and the K-means-like algorithm for piecewise regression demonstrate the effectiveness of the proposed approach.  相似文献   

7.
In this paper we provide an explicit probability distribution for classification purposes when observations are viewed on the real line and classifications are to be based on numerical orderings. The classification model is derived from a Bayesian nonparametric mixture of Dirichlet process model; with some modifications. The resulting approach then more closely resembles a classical hierarchical grouping rule in that it depends on sums of squares of neighboring values. The proposed probability model for classification relies on a numerical procedure based on a reversible Markov chain Monte Carlo (MCMC) algorithm for determining the probabilities. Some numerical illustrations comparing with alternative ideas for classification are provided.  相似文献   

8.
Traditionally latent class (LC) analysis is used by applied researchers as a tool for identifying substantively meaningful clusters. More recently, LC models have also been used as a density estimation tool for categorical variables. We introduce a divisive LC (DLC) model as a density estimation tool that may offer several advantages in comparison to a standard LC model. When using an LC model for density estimation, a considerable number of increasingly large LC models may have to be estimated before sufficient model-fit is achieved. A DLC model consists of a sequence of small LC models. Therefore, a DLC model can be estimated much faster and can easily utilize multiple processor cores, meaning that this model is more widely applicable and practical. In this study we describe the algorithm of fitting a DLC model, and discuss the various settings that indirectly influence the precision of a DLC model as a density estimation tool. These settings are illustrated using a synthetic data example, and the best performing algorithm is applied to a real-data example. The generated data example showed that, using specific decision rules, a DLC model is able to correctly model complex associations amongst categorical variables.  相似文献   

9.
Dimensionally reduced model-based clustering methods are recently receiving a wide interest in statistics as a tool for performing simultaneously clustering and dimension reduction through one or more latent variables. Among these, Mixtures of Factor Analyzers assume that, within each component, the data are generated according to a factor model, thus reducing the number of parameters on which the covariance matrices depend. In Factor Mixture Analysis clustering is performed through the factors of an ordinary factor analysis which are jointly modelled by a Gaussian mixture. The two approaches differ in genesis, parameterization and consequently clustering performance. In this work we propose a model which extends and combines them. The proposed Mixtures of Factor Mixture Analyzers provide a unified class of dimensionally reduced mixture models which includes the previous ones as special cases and could offer a powerful tool for modelling non-Gaussian latent variables.  相似文献   

10.
Classification and spatial methods can be used in conjunction to represent the individual information of similar preferences by means of groups. In the context of latent class models and using Simulated Annealing, the cluster-unfolding model for two-way two-mode preference rating data has been shown to be superior to a two-step approach of first deriving the clusters and then unfolding the classes. However, the high computational cost makes the procedure only suitable for small or medium-sized data sets, and the hypothesis of independent and normally distributed preference data may also be too restrictive in many practical situations. Therefore, an alternating least squares procedure is proposed, in which the individuals and the objects are partitioned into clusters, while at the same time the cluster centers are represented by unfolding. An enhanced Simulated Annealing algorithm in the least squares framework is also proposed in order to address the local optimum problem. Real and artificial data sets are analyzed to illustrate the performance of the model.  相似文献   

11.
T clusters, based on J distinct, contributory partitions (or, equivalently, J polytomous attributes). We describe a new model/algorithm for implementing this objective. The method's objective function incorporates a modified Rand measure, both in initial cluster selection and in subsequent refinement of the starting partition. The method is applied to both synthetic and real data. The performance of the proposed model is compared to latent class analysis of the same data set.  相似文献   

12.
A general set of multidimensional unfolding models and algorithms is presented to analyze preference or dominance data. This class of models termed GENFOLD2 (GENeral UnFOLDing Analysis-Version 2) allows one to perform internal or external analysis, constrained or unconstrained analysis, conditional or unconditional analysis, metric or nonmetric analysis, while providing the flexibility of specifying and/or testing a variety of different types of unfolding-type preference models mentioned in the literature including Caroll's (1972, 1980) simple, weighted, and general unfolding analysis. An alternating weighted least-squares algorithm is utilized and discussed in terms of preventing degenerate solutions in the estimation of the specified parameters. Finally, two applications of this new method are discussed concerning preference data for ten brands of pain relievers and twelve models of residential communication devices.  相似文献   

13.
Power and Sample Size Computation for Wald Tests in Latent Class Models   总被引:1,自引:0,他引:1  
Latent class (LC) analysis is used by social, behavioral, and medical science researchers among others as a tool for clustering (or unsupervised classification) with categorical response variables, for analyzing the agreement between multiple raters, for evaluating the sensitivity and specificity of diagnostic tests in the absence of a gold standard, and for modeling heterogeneity in developmental trajectories. Despite the increased popularity of LC analysis, little is known about statistical power and required sample size in LC modeling. This paper shows how to perform power and sample size computations in LC models using Wald tests for the parameters describing association between the categorical latent variable and the response variables. Moreover, the design factors affecting the statistical power of these Wald tests are studied. More specifically, we show how design factors which are specific for LC analysis, such as the number of classes, the class proportions, and the number of response variables, affect the information matrix. The proposed power computation approach is illustrated using realistic scenarios for the design factors. A simulation study conducted to assess the performance of the proposed power analysis procedure shows that it performs well in all situations one may encounter in practice.  相似文献   

14.
On characterizing optimization-based clustering methods   总被引:1,自引:1,他引:0  
This paper suggests a simplification of a recent approach suggested by Windham to characterizing optimization-based clustering methods. The simplification is based on noting an analogy between certain quantities in Windham's formulation and corresponding quantities in mathematical statistics, particularly sufficient statistics and the exponential family of densities.He thanks an anonymous referee for several helpful comments.  相似文献   

15.
K-modes Clustering   总被引:2,自引:0,他引:2  
0 norm (defined as the limit of an Lp norm as p approaches zero). In Monte Carlo simulations, both K-modes and the latent class procedures (e.g., Goodman 1974) performed with equal efficiency in recovering a known underlying cluster structure. However, K-modes is an order of magnitude faster than the latent class procedure in speed and suffers from fewer problems of local optima than do the latent class procedures. For data sets involving a large number of categorical variables, latent class procedures become computationally extremly slow and hence infeasible. We conjecture that, although in some cases latent class procedures might perform better than K-modes, it could out-perform latent class procedures in other cases. Hence, we recommend that these two approaches be used as "complementary" procedures in performing cluster analysis. We also present an empirical comparison of K-modes and latent class, where the former method prevails.  相似文献   

16.
17.
A mathematical programming approach to fitting general graphs   总被引:1,自引:1,他引:0  
We present an algorithm for fitting general graphs to proximity data. The algorithm utilizes a mathematical programming procedure based on a penalty function approach to impose additivity constraints upon parameters. For a user-specified number of links, the algorithm seeks to provide the connected network that gives the least-squares approximation to the proximity data with the specified number of links, allowing for linear transformations of the data. The network distance is the minimum-path-length metric for connected graphs. As a limiting case, the algorithm provides a tree where each node corresponds to an object, if the number of links is set equal to the number of objects minus one. A Monte Carlo investigation indicates that the resulting networks tend to fall within one percentage point of the least-squares solution in terms of the variance accounted for, but do not always attain this global optimum. The network model is discussed in relation to ordinal network representations (Klauer 1989) and NETSCAL (Hutchinson 1989), and applied to several well-known data sets.  相似文献   

18.
The DINA model is a commonly used model for obtaining diagnostic information. Like many other Diagnostic Classification Models (DCMs), it can require a large sample size to obtain reliable item and examinee parameter estimation. Neural Network (NN) analysis is a classification method that uses a training dataset for calibration. As a result, if this training dataset is determined theoretically, as was the case in Gierl’s attribute hierarchical method (AHM), the NN analysis does not have any sample size requirements. However, a NN approach does not provide traditional item parameters of a DCM or allow for item responses to influence test calibration. In this paper, the NN approach will be implemented for the DINA model estimation to explore its effectiveness as a classification method beyond its use in AHM. The accuracy of the NN approach across different sample sizes, item quality and Q-matrix complexity is described in the DINA model context. Then, a Markov Chain Monte Carlo (MCMC) estimation algorithm and Joint Maximum Likelihood Estimation is used to extend the NN approach so that item parameters associated with the DINA model are obtained while allowing examinee responses to influence the test calibration. The results derived by the NN, the combination of MCMC and NN (NN MCMC) and the combination of JMLE and NN are compared with that of the well-established Hierarchical MCMC procedure and JMLE with a uniform prior on the attribute profile to illustrate their strength and weakness.  相似文献   

19.
This study evaluates performance of information criteria used to separate latent classes. In the evaluations, various numbers of latent classes, sample sizes, parameter structures and latent-class complexities were designed to simulate datasets. The average accuracy rates of information criteria in selecting the designed numbers of latent classes were the core results in this experiment. The study revealed that widely used information criteria, e.g., AIC, BIC, CAIC, could perform poorly under some circumstances. By including a sample size adjustment (Rissanen, 1978), the unsatis-factory performances could be improved considerably. The sample size adjustment provides a plausible solution for separating latent classes. Guidelines are provided to help achieve optimum use of the model fit indices.  相似文献   

20.
We propose a non-negative real-valued model of hierarchical classes (HICLAS) for two-way two-mode data. Like the other members of the HICLAS family, the non-negative real-valued model (NNRV-HICLAS) implies simultaneous hierarchically organized classifications of all modes involved in the data. A distinctive feature of the novel model is that it yields continuous, non-negative real-valued reconstructed data, which considerably expands the application range of the HICLAS family. The expansion implies a major algorithmic challenge as it involves a move from the typical discrete optimization problems in HICLAS to a mixed discrete-continuous one. To solve this mixed discrete-continuous optimization problem, a two-stage algorithm combining a simulated annealing and an alternating local descent stage is proposed. Subsequently it is evaluated in a simulation study. Finally, the NNRVHICLAS model is applied to an empirical data set on anger.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号