首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
Normal mixture models are widely used for statistical modeling of data, including cluster analysis. However maximum likelihood estimation (MLE) for normal mixtures using the EM algorithm may fail as the result of singularities or degeneracies. To avoid this, we propose replacing the MLE by a maximum a posteriori (MAP) estimator, also found by the EM algorithm. For choosing the number of components and the model parameterization, we propose a modified version of BIC, where the likelihood is evaluated at the MAP instead of the MLE. We use a highly dispersed proper conjugate prior, containing a small fraction of one observation's worth of information. The resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE, EM and BIC.  相似文献   

2.
Using a natural metric on the space of networks, we define a probability measure for network-valued random variables. This measure is indexed by two parameters, which are interpretable as a location parameter and a dispersion parameter. From this structure, one can develop maximum likelihood estimates, hypothesis tests and confidence regions, all in the context of independent and identically distributed networks. The value of this perspective is illustrated through application to portions of the friedship cognitive social structure data gathered by Krackhardt (1987).We thank Ove Frank, David Krackhardt, the editor and the referees for their constructive comments and suggestions.  相似文献   

3.
Finite mixture modeling is a popular statistical technique capable of accounting for various shapes in data. One popular application of mixture models is model-based clustering. This paper considers the problem of clustering regression autoregressive moving average time series. Two novel estimation procedures for the considered framework are developed. The first one yields the conditional maximum likelihood estimates which can be used in cases when the length of times series is substantial. Simple analytical expressions make fast parameter estimation possible. The second method incorporates the Kalman filter and yields the exact maximum likelihood estimates. The procedure for assessing variability in obtained estimates is discussed. We also show that the Bayesian information criterion can be successfully used to choose the optimal number of mixture components and correctly assess time series orders. The performance of the developed methodology is evaluated on simulation studies. An application to the analysis of tree ring data is thoroughly considered. The results are very promising as the proposed approach overcomes the limitations of other methods developed so far.  相似文献   

4.
Many problems entail the analysis of data that are independent and identically distributed random graphs. Useful inference requires flexible probability models for such random graphs; these models should have interpretable location and scale parameters, and support the establishment of confidence regions, maximum likelihood estimates, goodness-of-fit tests, Bayesian inference, and an appropriate analogue of linear model theory. Banks and Carley (1994) develop a simple probability model and sketch some analyses; this paper extends that work so that analysts are able to choose models that reflect application-specific metrics on the set of graphs. The strategy applies to graphs, directed graphs, hypergraphs, and trees, and often extends to objects in countable metric spaces.  相似文献   

5.
A mixture likelihood approach for generalized linear models   总被引:6,自引:0,他引:6  
A mixture model approach is developed that simultaneously estimates the posterior membership probabilities of observations to a number of unobservable groups or latent classes, and the parameters of a generalized linear model which relates the observations, distributed according to some member of the exponential family, to a set of specified covariates within each Class. We demonstrate how this approach handles many of the existing latent class regression procedures as special cases, as well as a host of other parametric specifications in the exponential family heretofore not mentioned in the latent class literature. As such we generalize the McCullagh and Nelder approach to a latent class framework. The parameters are estimated using maximum likelihood, and an EM algorithm for estimation is provided. A Monte Carlo study of the performance of the algorithm for several distributions is provided, and the model is illustrated in two empirical applications.  相似文献   

6.
A maximum likelihood methodology for clusterwise linear regression   总被引:9,自引:0,他引:9  
This paper presents a conditional mixture, maximum likelihood methodology for performing clusterwise linear regression. This new methodology simultaneously estimates separate regression functions and membership inK clusters or groups. A review of related procedures is discussed with an associated critique. The conditional mixture, maximum likelihood methodology is introduced together with the E-M algorithm utilized for parameter estimation. A Monte Carlo analysis is performed via a fractional factorial design to examine the performance of the procedure. Next, a marketing application is presented concerning the evaluations of trade show performance by senior marketing executives. Finally, other potential applications and directions for future research are identified.  相似文献   

7.
We investigate the effects of a complex sampling design on the estimation of mixture models. An approximate or pseudo likelihood approach is proposed to obtain consistent estimates of class-specific parameters when the sample arises from such a complex design. The effects of ignoring the sample design are demonstrated empirically in the context of an international value segmentation study in which a multinomial mixture model is applied to identify segment-level value rankings. The analysis reveals that ignoring the sample design results in both an incorrect number of segments as identified by information criteria and biased estimates of segment-level parameters.  相似文献   

8.
Multiple imputation is one of the most highly recommended procedures for dealing with missing data. However, to date little attention has been paid to methods for combining the results from principal component analyses applied to a multiply imputed data set. In this paper we propose Generalized Procrustes analysis for this purpose, of which its centroid solution can be used as a final estimate for the component loadings. Convex hulls based on the loadings of the imputed data sets can be used to represent the uncertainty due to the missing data. In two simulation studies, the performance of Generalized Procrustes approach is evaluated and compared with other methods. More specifically it is studied how these methods behave when order changes of components and sign reversals of component loadings occur, such as in case of near-equal eigenvalues, or data having almost as many counterindicative items as indicative items. The simulations show that other proposed methods either may run into serious problems or are not able to adequately assess the accuracy due to the presence of missing data. However, when the above situations do not occur, all methods will provide adequate estimates for the PCA loadings.  相似文献   

9.
The INDSCAL individual differences scaling model is extended by assuming dimensions specific to each stimulus or other object, as well as dimensions common to all stimuli or objects. An alternating maximum likelihood procedure is used to seek maximum likelihood estimates of all parameters of this EXSCAL (Extended INDSCAL) model, including parameters of monotone splines assumed in a quasi-nonmetric approach. The rationale for and numerical details of this approach are described and discussed, and the resulting EXSCAL method is illustrated on some data on perception of musical timbres.  相似文献   

10.
Percept variance is shown to change the additive property of city-block distances and make city-block distances more subadditive than Euclidean distances. Failure to account for percept variance will result in the misclassification of city-block data as Euclidean. A maximum likelihood estimation procedure is proposed for the multidimensional scaling of similarity data characterized by percept variance. Monte Carlo and empirical experiments are used to evaluate the proposed approach.  相似文献   

11.
Unfolding creates configurations from preference information. In this paper, it is argued that not all preference information needs to be collected and that good solutions are still obtained, even when more than half of the data is missing. Simulation studies are conducted to compare missing data treatments, sources of missing data, and designs for the specification of missing data. Guidelines are provided and used in actual practice.  相似文献   

12.
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework.  相似文献   

13.
A probabilistic DEDICOM model was proposed for mobility tables. The model attempts to explain observed transition probabilities by a latent mobility table and a set of transition probabilities from latent classes to observed classes. The model captures asymmetry in observed mobility tables by asymmetric latent mobility tables. It may be viewed as a special case of both the latent class model and DEDICOM with special constraints. A maximum penalized likelihood (MPL) method was developed for parameter estimation. The EM algorithm was adapted for the MPL estimation. Two examples were given to illustrate the proposed method. The work reported in this paper has been supported by grant A6394 to the first author from the Natural Sciences and Engineering Research Council of Canada and by a fellowship of the Royal Netherlands Academy of Arts and Sciences to the second author. We would like to thank anonymous reviewers for their insightful comments.  相似文献   

14.
Framework of this paper is statistical data editing, specifically how to edit or impute missing or contradictory data and how to merge two independent data sets presenting some lack of information. Assuming a missing at random mechanism, this paper provides an accurate tree-based methodology for both missing data imputation and data fusion that is justified within the Statistical Learning Theory of Vapnik. It considers both an incremental variable imputation method to improve computational efficiency as well as boosted trees to gain in prediction accuracy with respect to other methods. As a result, the best approximation of the structural risk (also known as irreducible error) is reached, thus reducing at minimum the generalization (or prediction) error of imputation. Moreover, it is distribution free, it holds independently of the underlying probability law generating missing data values. Performance analysis is discussed considering simulation case studies and real world applications.  相似文献   

15.
The mixture method of clustering applied to three-way data   总被引:3,自引:3,他引:0  
Clustering or classifying individuals into groups such that there is relative homogeneity within the groups and heterogeneity between the groups is a problem which has been considered for many years. Most available clustering techniques are applicable only to a two-way data set, where one of the modes is to be partitioned into groups on the basis of the other mode. Suppose, however, that the data set is three-way. Then what is needed is a multivariate technique which will cluster one of the modes on the basis of both of the other modes simultaneously. It is shown that by appropriate specification of the underlying model, the mixture maximum likelihood approach to clustering can be applied in the context of a three-way table. It is illustrated using a soybean data set which consists of multiattribute measurements on a number of genotypes each grown in several environments. Although the problem is set in the framework of clustering genotypes, the technique is applicable to other types of three-way data sets.  相似文献   

16.
This paper presents a Bayesian model based clustering approach for dichotomous item responses that deals with issues often encountered in model based clustering like missing data, large data sets and within cluster dependencies. The approach proposed will be illustrated using an example concerning Brand Strategy Research.  相似文献   

17.
Clustering criteria for discrete data and latent class models   总被引:1,自引:0,他引:1  
We show that a well-known clustering criterion for discrete data, the information criterion, is closely related to the classification maximum likelihood criterion for the latent class model. This relation can be derived from the Bryant-Windham construction. Emphasis is placed on binary clustering criteria which are analyzed under the maximum likelihood approach for different multivariate Bernoulli mixtures. This alternative form of criterion reveals non-apparent aspects of clustering techniques. All the criteria discussed can be optimized with the alternating optimization algorithm. Some illustrative applications are included.
Résumé Nous montrons que le critère de classification de l'information, souvent utilisé pour les données discrètes, est très lié au critère du maximum de vraisemblance classifiante appliqué au modèle des classes latentes. Ce lien peut être analysé sous l'approche de la paramétrisation de Bryant-Windham. L'accent est mis sur le cas des données binaires qui sont analysées sous l'approche du maximum de vraisemblance pour les mélanges de distributions multivariées de Bernoulli. Cette forme de critère permet de mettre en évidence des aspects cachés des méthodes de classification de données binaires. Tous les critères envisagés ici peuvent être optimisés avec l'algorithme d'optimisation alternée. Des exemples concluent cet article.
  相似文献   

18.
A Thurstonian model for ranks is introduced in which rank-induced dependencies are specified through correlation coefficients among ranked objects that are determined by a vector of rank-induced parameters. The ranking model can be expressed in terms of univariate normal distribution functions, thus simplifying a previously computationally intensive problem. A theorem is proven that shows that the specification given in the paper for the dependencies is the only way that this simplification can be achieved under the process assumptions of the model. The model depends on certain conditional probabilities that arise from item orders considered by subjects as they make ranking decisions. Examples involving a complete set of ranks and a set with missing values are used to illustrate recovery of the objects’ scale values and the rank dependency parameters. Application of the model to ranks for gift items presented singly or as composite items is also discussed.  相似文献   

19.
Starting from the problem of missing data in surveys with Likert-type scales, the aim of this paper is to evaluate a possible improvement for the imputation procedure proposed by Lavori, Dawson, and Shera (1995) here called Approximate Bayesian bootstrap with Propensity score (ABP). We propose an imputation procedure named Approximate Bayesian bootstrap with Propensity score and Nearest neighbour (ABPN), which, after the ??propensity score step?? of ABP, randomly selects a donor in the nonrespondent??s neighbourhood, which includes cases with response patterns similar to the one of the nonrespondent to be imputed. A preliminary simulation study with single imputation on missing data in two Likerttype scales from a real data set shows that ABPN: (a) performed better than the ABP imputation, and (b) can be considered as a serious competitor of other procedures used in this context.  相似文献   

20.
A modified CANDECOMP algorithm is presented for fitting the metric version of the Extended INDSCAL model to three-way proximity data. The Extended INDSCAL model assumes, in addition to the common dimensions, a unique dimension for each object. The modified CANDECOMP algorithm fits the Extended INDSCAL model in a dimension-wise fashion and ensures that the subject weights for the common and the unique dimensions are nonnegative. A Monte Carlo study is reported to illustrate that the method is fairly insensitive to the choice of the initial parameter estimates. A second Monte Carlo study shows that the method is able to recover an underlying Extended INDSCAL structure if present in the data. Finally, the method is applied for illustrative purposes to some empirical data on pain relievers. In the final section, some other possible uses of the new method are discussed. Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijik Onderzoek”.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号