首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
On characterizing optimization-based clustering methods   总被引:1,自引:1,他引:0  
This paper suggests a simplification of a recent approach suggested by Windham to characterizing optimization-based clustering methods. The simplification is based on noting an analogy between certain quantities in Windham's formulation and corresponding quantities in mathematical statistics, particularly sufficient statistics and the exponential family of densities.He thanks an anonymous referee for several helpful comments.  相似文献   

Analysis of between-group differences using canonical variates assumes equality of population covariance matrices. Sometimes these matrices are sufficiently different for the null hypothesis of equality to be rejected, but there exist some common features which should be exploited in any analysis. The common principal component model is often suitable in such circumstances, and this model is shown to be appropriate in a practical example. Two methods for between-group analysis are proposed when this model replaces the equal dispersion matrix assumption. One method is by extension of the two-stage approach to canonical variate analysis using sequential principal component analyses as described by Campbell and Atchley (1981). The second method is by definition of a distance function between populations satisfying the common principal component model, followed by metric scaling of the resulting between-populations distance matrix. The two methods are compared with each other and with ordinary canonical variate analysis on the previously introduced data set.  相似文献   

中国花卉名称混乱的原因及对策   总被引:2,自引:0,他引:2  
中国花卉种质资源种类繁多,是世界公认的"园林之母",但由于历史、社会和人为等方面的原因,造成了严重的同物异名或同名异物现象;各种花卉的拉丁名、汉语俗名等多不尽一致,国内花卉市场上花卉的名称也多混乱。所有这些均给中国花卉的生产、销售、国际交流及科学研究带来了麻烦,甚至造成了损失。根据《国际植物命名法规》与《国际栽培植物命名法规》,列出了绣线菊属中一些不符合命名法规的种类与品种名称,说明了中国花卉命名应遵循的原则,分析了中国花卉名称混乱产生的原因,提出了避免花卉名称混乱的几项对策。  相似文献   

In educational measurement, cognitive diagnosis models have been developed to allow assessment of specific skills that are needed to perform tasks. Skill knowledge is characterized as present or absent and represented by a vector of binary indicators, or the skill set profile. After determining which skills are needed for each assessment item, a model is specified for the relationship between item responses and skill set profiles. Cognitive diagnosis models are often used for diagnosis, that is, for classifying students into the different skill set profiles. Generally, cognitive diagnosis models do not exploit student covariate information. However, investigating the effects of student covariates, such as gender, SES, or educational interventions, on skill knowledge mastery is important in education research, and covariate information may improve classification of students to skill set profiles. We extend a common cognitive diagnosis model, the DINA model, by modeling the relationship between the latent skill knowledge indicators and covariates. The probability of skill mastery is modeled as a logistic regression model, possibly with a student-level random intercept, giving a higher-order DINA model with a latent regression. Simulations show that parameter recovery is good for these models and that inclusion of covariates can improve skill diagnosis. When applying our methods to data from an online tutor, we obtain reasonable and interpretable parameter estimates that allow more detailed characterization of groups of students who differ in their predicted skill set profiles.  相似文献   

Multiple imputation is one of the most highly recommended procedures for dealing with missing data. However, to date little attention has been paid to methods for combining the results from principal component analyses applied to a multiply imputed data set. In this paper we propose Generalized Procrustes analysis for this purpose, of which its centroid solution can be used as a final estimate for the component loadings. Convex hulls based on the loadings of the imputed data sets can be used to represent the uncertainty due to the missing data. In two simulation studies, the performance of Generalized Procrustes approach is evaluated and compared with other methods. More specifically it is studied how these methods behave when order changes of components and sign reversals of component loadings occur, such as in case of near-equal eigenvalues, or data having almost as many counterindicative items as indicative items. The simulations show that other proposed methods either may run into serious problems or are not able to adequately assess the accuracy due to the presence of missing data. However, when the above situations do not occur, all methods will provide adequate estimates for the PCA loadings.  相似文献   

Models for the representation of proximity data (similarities/dissimilarities) can be categorized into one of three groups of models: continuous spatial models, discrete nonspatial models, and hybrid models (which combine aspects of both spatial and discrete models). Multidimensional scaling models and associated methods, used for thespatial representation of such proximity data, have been devised to accommodate two, three, and higher-way arrays. At least one model/method for overlapping (but generally non-hierarchical) clustering called INDCLUS (Carroll and Arabie 1983) has been devised for the case of three-way arrays of proximity data. Tree-fitting methods, used for thediscrete network representation of such proximity data, have only thus far been devised to handle two-way arrays. This paper develops a new methodology called INDTREES (for INdividual Differences in TREE Structures) for fitting various(discrete) tree structures to three-way proximity data. This individual differences generalization is one in which different individuals, for example, are assumed to base their judgments on the same family of trees, but are allowed to have different node heights and/or branch lengths.We initially present an introductory overview focussing on existing two-way models. The INDTREES model and algorithm are then described in detail. Monte Carlo results for the INDTREES fitting of four different three-way data sets are presented. In the application, a single ultrametric tree is fitted to three-way proximity data derived from intention-to-buy-data for various brands of over-the-counter pain relievers for relieving three common types of maladies. Finally, we briefly describe how the INDTREES procedure can be extended to accommodate hybrid modelling, as well as to handle other types of applications.  相似文献   

L2 -norm: (1) dynamic programming; (2) an iterative quadratic assignment improvement heuristic; (3) the Guttman update strategy as modified by Pliner's technique of smoothing; (4) a nonlinear programming reformulation by Lau, Leung, and Tse. The methods are all implemented through (freely downloadable) MATLAB m-files; their use is illustrated by a common data set carried throughout. For the computationally intensive dynamic programming formulation that can a globally optimal solution, several possible computational improvements are discussed and evaluated using (a) a transformation of a given m-function with the MATLAB Compiler into C code and compiling the latter; (b) rewriting an m-function and a mandatory MATLAB gateway directly in Fortran and compiling into a MATLAB callable file; (c) comparisons of the acceleration of raw m-files implemented under the most recent release of MATLAB Version 6.5 (and compared to the absence of such acceleration under the previous MATLAB Version 6.1). Finally, and in contrast to the combinatorial optimization task of identifying a best unidimensional scaling for a given proximity matrix, an approach is given for the confirmatory fitting of a given unidimensional scaling based only on a fixed object ordering, and to nonmetric unidensional scaling that incorporates an additional optimal monotonic transformation of the proximities.  相似文献   

A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework.  相似文献   

Finite mixture modeling is a popular statistical technique capable of accounting for various shapes in data. One popular application of mixture models is model-based clustering. This paper considers the problem of clustering regression autoregressive moving average time series. Two novel estimation procedures for the considered framework are developed. The first one yields the conditional maximum likelihood estimates which can be used in cases when the length of times series is substantial. Simple analytical expressions make fast parameter estimation possible. The second method incorporates the Kalman filter and yields the exact maximum likelihood estimates. The procedure for assessing variability in obtained estimates is discussed. We also show that the Bayesian information criterion can be successfully used to choose the optimal number of mixture components and correctly assess time series orders. The performance of the developed methodology is evaluated on simulation studies. An application to the analysis of tree ring data is thoroughly considered. The results are very promising as the proposed approach overcomes the limitations of other methods developed so far.  相似文献   

在国际上,术语学是一门成熟的独立学科,有完善的教育体系。而中国,虽已在术语规范的实践工作上取得了巨大成就,但术语学研究和术语学教育仍处于起步阶段。目前,中国应争取在大学开设术语学课程。该课程不仅可以为中国术语规范实践工作和理论研究培养人才,还对提高学生素质具有重要意义,表现在:一是有助于培养学生热爱母语的品德;二是增加学生的科技素养和人文素养;三是传授系统的术语学理论,增加就业机会;四是有助于培养学生使用规范术语的习惯。  相似文献   

We investigate the effects of a complex sampling design on the estimation of mixture models. An approximate or pseudo likelihood approach is proposed to obtain consistent estimates of class-specific parameters when the sample arises from such a complex design. The effects of ignoring the sample design are demonstrated empirically in the context of an international value segmentation study in which a multinomial mixture model is applied to identify segment-level value rankings. The analysis reveals that ignoring the sample design results in both an incorrect number of segments as identified by information criteria and biased estimates of segment-level parameters.  相似文献   

根据国际商会1999年制定并于2000年1月1日生效的《2000年国际贸易术语解释通则》,对现行的13种国际贸易术语进行解读,并对《2000通则》与《1990通则》之间的差别之处加以比较。  相似文献   

Many problems entail the analysis of data that are independent and identically distributed random graphs. Useful inference requires flexible probability models for such random graphs; these models should have interpretable location and scale parameters, and support the establishment of confidence regions, maximum likelihood estimates, goodness-of-fit tests, Bayesian inference, and an appropriate analogue of linear model theory. Banks and Carley (1994) develop a simple probability model and sketch some analyses; this paper extends that work so that analysts are able to choose models that reflect application-specific metrics on the set of graphs. The strategy applies to graphs, directed graphs, hypergraphs, and trees, and often extends to objects in countable metric spaces.  相似文献   

在大学开设术语学课程的探讨   总被引:4,自引:2,他引:2  
在国际上,术语学是一门成熟的独立学科,有完善的教育体系。而中国,虽已在术语规范的实践工作上取得了巨大成就,但术语学研究和术语学教育仍处于起步阶段。目前,中国应争取在大学开设术语学课程。该课程不仅可以为中国术语规范实践工作和理论研究培养人才,还对提高学生素质具有重要意义,表现在:一是有助于培养学生热爱母语的品德;二是增加学生的科技素养和人文素养;三是传授系统的术语学理论,增加就业机会;四是有助于培养学生使用规范术语的习惯。  相似文献   

根据国际商会1999年制定并于2000年1月1日生效的《2000年国际贸易术语解释通则》,对现行的13种国际贸易术语进行解读,并对《2000通则》与《1990通则》之间的差别之处加以比较。  相似文献   

A modified CANDECOMP algorithm is presented for fitting the metric version of the Extended INDSCAL model to three-way proximity data. The Extended INDSCAL model assumes, in addition to the common dimensions, a unique dimension for each object. The modified CANDECOMP algorithm fits the Extended INDSCAL model in a dimension-wise fashion and ensures that the subject weights for the common and the unique dimensions are nonnegative. A Monte Carlo study is reported to illustrate that the method is fairly insensitive to the choice of the initial parameter estimates. A second Monte Carlo study shows that the method is able to recover an underlying Extended INDSCAL structure if present in the data. Finally, the method is applied for illustrative purposes to some empirical data on pain relievers. In the final section, some other possible uses of the new method are discussed. Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijik Onderzoek”.  相似文献   

Direct multicriteria clustering algorithms   总被引:1,自引:0,他引:1  
In a multicriteria clustering problem, optimization over more than one criterion is required. The problem can be treated in different ways: by reduction to a clustering problem with the single criterion obtained as a combination of the given criteria; by constrained clustering algorithms where a selected critetion is considered as the clustering criterion and all others determine the constraints; or by direct algorithms. In this paper two types of direct algorithms for solving multicriteria clustering problem are proposed: the modified relocation algorithm, and the modified agglomerative algorithm. Different elaborations of these two types of algorithms are discussed and compared. Finally, two applications of the proposed algorithms are presented. Elaborated version of the talks presented at the First Conference of the International Federation of Classification Societies, Aachen, 1987, at the International Conference on Social Science Methodology, Dubrovnik, 1988, and at the Second Conference of the International Federation of Classification Societies, Charlottesville, 1989. This work was supported in part by the Research Council of Slovenia.  相似文献   

医学英语词汇的命名方法众多,比如人名地名命名、神话典故命名、会意命名、音译命名、隐喻命名。隐喻法借助生活中的常见事物代指深奥复杂的医学词汇,让人从熟知事物的特征上推知医学疾病或解剖结构的含义,具有简单直观的优势,所以自古以来医学研究者通过隐喻机制拓展出大量医学英语词。文章从隐喻视角出发,搜集挖掘了一系列医学英语词,希望对医学英语研究起到一定借鉴作用。  相似文献   

中医病名英译规范策略   总被引:2,自引:0,他引:2  
通过分析中西医疾病命名特点、中医英译原则,对中医病名英译的策略进行了较为深入的讨论。根据中西医病名含义的多种对应关系,采用首选意译、次选直译、控制音译和多种译法结合的翻译策略。详细论述了如何灵活应用三种翻译方法对各种中医病名进行翻译。  相似文献   

Weighting and selection of variables for cluster analysis   总被引:1,自引:0,他引:1  
One of the thorniest aspects of cluster analysis continues to be the weighting and selection of variables. This paper reports on the performance of nine methods on eight leading case simulated and real sets of data. The results demonstrate shortcomings of weighting based on the standard deviation or range as well as other more complex schemes in the literature. Weighting schemes based upon carefully chosen estimates of within-cluster and between-cluster variability are generally more effective. These estimates do not require knowledge of the cluster structure. Additional research is essential: worry-free approaches do not yet exist.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号