首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Direct multicriteria clustering algorithms   总被引:1,自引:0,他引:1  
In a multicriteria clustering problem, optimization over more than one criterion is required. The problem can be treated in different ways: by reduction to a clustering problem with the single criterion obtained as a combination of the given criteria; by constrained clustering algorithms where a selected critetion is considered as the clustering criterion and all others determine the constraints; or by direct algorithms. In this paper two types of direct algorithms for solving multicriteria clustering problem are proposed: the modified relocation algorithm, and the modified agglomerative algorithm. Different elaborations of these two types of algorithms are discussed and compared. Finally, two applications of the proposed algorithms are presented. Elaborated version of the talks presented at the First Conference of the International Federation of Classification Societies, Aachen, 1987, at the International Conference on Social Science Methodology, Dubrovnik, 1988, and at the Second Conference of the International Federation of Classification Societies, Charlottesville, 1989. This work was supported in part by the Research Council of Slovenia.  相似文献   

2.
绿蓝问题与简单性方案   总被引:2,自引:0,他引:2  
绿蓝问题是假说选择问题的一种范型.科学假说精确的逻辑结构应该包含假说适用的情景和敏感因子.简单性标准是广为接受的一种假说选择标准,在运用到解决绿蓝问题时,提出了简单性的句法理论和语义理论.但这两种理论分别遭遇了对表达系统和概念框架的相对性,不能处理某些绿蓝型问题,甚至会陷入反直观境地的难题.  相似文献   

3.
Finite mixture modeling is a popular statistical technique capable of accounting for various shapes in data. One popular application of mixture models is model-based clustering. This paper considers the problem of clustering regression autoregressive moving average time series. Two novel estimation procedures for the considered framework are developed. The first one yields the conditional maximum likelihood estimates which can be used in cases when the length of times series is substantial. Simple analytical expressions make fast parameter estimation possible. The second method incorporates the Kalman filter and yields the exact maximum likelihood estimates. The procedure for assessing variability in obtained estimates is discussed. We also show that the Bayesian information criterion can be successfully used to choose the optimal number of mixture components and correctly assess time series orders. The performance of the developed methodology is evaluated on simulation studies. An application to the analysis of tree ring data is thoroughly considered. The results are very promising as the proposed approach overcomes the limitations of other methods developed so far.  相似文献   

4.
退相干解释的"稳定性判据"被认为是解决测量问题的关键一环.作为该解释的最新发展,量子达尔文主义试图对稳定性判据做出解释.该主义虽然部分地被最新物理实验证据所支持,但仍无法彻底解决测量问题.2016年,测量问题的"新单子论"方案被提出,利用"知觉-能量原理"和"意志-能量原理"解决了测量问题中的明确结果问题和优先基矢问题...  相似文献   

5.
Optimization Strategies for Two-Mode Partitioning   总被引:2,自引:2,他引:0  
Two-mode partitioning is a relatively new form of clustering that clusters both rows and columns of a data matrix. In this paper, we consider deterministic two-mode partitioning methods in which a criterion similar to k-means is optimized. A variety of optimization methods have been proposed for this type of problem. However, it is still unclear which method should be used, as various methods may lead to non-global optima. This paper reviews and compares several optimization methods for two-mode partitioning. Several known methods are discussed, and a new fuzzy steps method is introduced. The fuzzy steps method is based on the fuzzy c-means algorithm of Bezdek (1981) and the fuzzy steps approach of Heiser and Groenen (1997) and Groenen and Jajuga (2001). The performances of all methods are compared in a large simulation study. In our simulations, a two-mode k-means optimization method most often gives the best results. Finally, an empirical data set is used to give a practical example of two-mode partitioning. We would like to thank two anonymous referees whose comments have improved the quality of this paper. We are also grateful to Peter Verhoef for providing the data set used in this paper.  相似文献   

6.
The Self-Organizing Feature Maps (SOFM; Kohonen 1984) algorithm is a well-known example of unsupervised learning in connectionism and is a clustering method closely related to the k-means. Generally the data set is available before running the algorithm and the clustering problem can be approached by an inertia criterion optimization. In this paper we consider the probabilistic approach to this problem. We propose a new algorithm based on the Expectation Maximization principle (EM; Dempster, Laird, and Rubin 1977). The new method can be viewed as a Kohonen type of EM and gives a better insight into the SOFM according to constrained clustering. We perform numerical experiments and compare our results with the standard Kohonen approach.  相似文献   

7.
We consider applying a functional logistic discriminant procedure to the analysis of handwritten character data. Time-course trajectories corresponding to the X and Y coordinate values of handwritten characters written in the air with one finger are converted into a functional data set via regularized basis expansion. We then apply functional logistic modeling to classify the functions into several classes. In order to select the values of adjusted parameters involved in the functional logistic model, we derive a model selection criterion for evaluating models estimated by the method of regularization. Results indicate the effectiveness of our modeling strategy in terms of prediction accuracy.  相似文献   

8.
The paper presents a methodology for classifying three-way dissimilarity data, which are reconstructed by a small number of consensus classifications of the objects each defined by a sum of two order constrained distance matrices, so as to identify both a partition and an indexed hierarchy. Specifically, the dissimilarity matrices are partitioned in homogeneous classes and, within each class, a partition and an indexed hierarchy are simultaneously fitted. The model proposed is mathematically formalized as a constrained mixed-integer quadratic problem to be fitted in the least-squares sense and an alternating least-squares algorithm is proposed which is computationally efficient. Two applications of the methodology are also described together with an extensive simulation to investigate the performance of the algorithm.  相似文献   

9.
We introduce new similarity measures between two subjects, with reference to variables with multiple categories. In contrast to traditionally used similarity indices, they also take into account the frequency of the categories of each attribute in the sample. This feature is useful when dealing with rare categories, since it makes sense to differently evaluate the pairwise presence of a rare category from the pairwise presence of a widespread one. A weighting criterion for each category derived from Shannon??s information theory is suggested. There are two versions of the weighted index: one for independent categorical variables and one for dependent variables. The suitability of the proposed indices is shown in this paper using both simulated and real world data sets.  相似文献   

10.
方中通《数度衍》中所见的约瑟夫斯问题   总被引:1,自引:1,他引:0  
方中通《数度衍》卷23中列有一个属于约瑟夫斯问题的题目,这是目前所知中国古算书中惟一的一个有关的题目。文章首先简述了约瑟夫斯问题的历史,接着介绍了方中通在《数度衍》中记述的命题,分析了其中所包含的内容和存在的问题,最后论述了这个题目的意义,并对它的来源提出了看法。  相似文献   

11.
A column generation based approach is proposed for solving the cluster-wise regression problem. The proposed strategy relies firstly on several efficient heuristic strategies to insert columns into the restricted master problem. If these heuristics fail to identify an improving column, an exhaustive search is performed starting with incrementally larger ending subsets, all the while iteratively performing heuristic optimization to ensure a proper balance of exact and heuristic optimization. Additionally, observations are sequenced by their dual variables and by their inclusion in joint pair branching rules. The proposed strategy is shown to outperform the best known alternative (BBHSE) when the number of clusters is greater than three. Additionally, the current work further demonstrates and expands the successful use of the new paradigm of using incrementally larger ending subsets to strengthen the lower bounds of a branch and bound search as pioneered by Brusco's Repetitive Branch and Bound Algorithm (RBBA).  相似文献   

12.
In this paper, we consider an entropy criterion to estimate the number of clusters arising from a mixture model. This criterion is derived from a relation linking the likelihood and the classification likelihood of a mixture. Its performance is investigated through Monte Carlo experiments, and it shows favorable results compared to other classical criteria.
Résumé Nous proposons un critère d'entropie pour évaluer le nombre de classes d'une partition en nous fondant sur un modèle de mélange de lois de probabilité. Ce critère se déduit d'une relation liant la vraisemblance et la vraisemblance classifiante d'un mélange. Des simulations de Monte Carlo illustrent ses qualités par rapport à des critères plus classiques.
  相似文献   

13.
术语的界定问题一直都是术语学界研究的课题。术语与普通词语之间的交融渗透关系使得术语的界定复杂。如何提供一种可操作的术语界定标准来指导术语识别是亟待解决的现实问题。提出一种面向术语识别的术语界定方法,对于术语词典的编纂与更新都是一项有意义的工作。  相似文献   

14.
Multiple choice items on tests and Likert items on surveys are ubiquitous in educational, social and behavioral science research; however, methods for analyzing of such data can be problematic. Multidimensional item response theory models are proposed that yield structured Poisson regression models for the joint distribution of responses to items. The methodology presented here extends the approach described in Anderson, Verkuilen, and Peyton (2010) that used fully conditionally specified multinomial logistic regression models as item response functions. In this paper, covariates are added as predictors of the latent variables along with covariates as predictors of location parameters. Furthermore, the models presented here incorporate ordinal information of the response options thus allowing an empirical examination of assumptions regarding the ordering and the estimation of optimal scoring of the response options. To illustrate the methodology and flexibility of the models, data from a study on aggression in middle school (Espelage, Holt, and Henkel 2004) is analyzed. The models are fit to data using SAS.  相似文献   

15.
In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance andWilliams’ formula which enables the implementation of the algorithm in a recursive way. The authors thank A. Arenas for discussion and helpful comments. This work was partially supported by DGES of the Spanish Government Project No. FIS2006–13321–C02–02 and by a grant of Universitat Rovira i Virgili.  相似文献   

16.
A consensus index method is an ordered pair consisting of a consensus method and a consensus index Day and McMorris (1985) have specified two minimal axioms, one which should be satisfied by the consensus method and the other by the consensus index The axiom for consensus indices is not satisfied by the s-consensus index In this paper, an additional axiom, which states that a consensus index equal to one implies profile unanimity, is proposed The s-consensus method together with a modification of the s-consensus index (i e, normalized by the number of distinct nontrivial clusters in the profile) is shown to satisfy the two axioms proposed by Day and McMorris and the new axiom  相似文献   

17.
In this paper two alternative loss criteria for the least squares Procrustes problem are studied. These alternative criteria are based on the Huber function and on the more radical biweight function, which are designed to be resistant to outliers. Using iterative majorization it is shown how a convergent reweighted least squares algorithm can be developed. In asimulation study it turns out that the proposed methods perform well over a specific range of contamination. When a uniform dilation factor is included, mixed results are obtained. The methods also yield a set of weights that can be used for diagnostic purposes.  相似文献   

18.
In this philosophical paper, we explore computational and biological analogies to address the fine-tuning problem in cosmology. We first clarify what it means for physical constants or initial conditions to be fine-tuned. We review important distinctions such as the dimensionless and dimensional physical constants, and the classification of constants proposed by Lévy-Leblond. Then we explore how two great analogies, computational and biological, can give new insights into our problem. This paper includes a preliminary study to examine the two analogies. Importantly, analogies are both useful and fundamental cognitive tools, but can also be misused or misinterpreted. The idea that our universe might be modelled as a computational entity is analysed, and we discuss the distinction between physical laws and initial conditions using algorithmic information theory. Smolin introduced the theory of “Cosmological Natural Selection” with a biological analogy in mind. We examine an extension of this analogy involving intelligent life. We discuss if and how this extension could be legitimated.  相似文献   

19.
This paper proposes a maximum clustering similarity (MCS) method for determining the number of clusters in a data set by studying the behavior of similarity indices comparing two (of several) clustering methods. The similarity between the two clusterings is calculated at the same number of clusters, using the indices of Rand (R), Fowlkes and Mallows (FM), and Kulczynski (K) each corrected for chance agreement. The number of clusters at which the index attains its maximum is a candidate for the optimal number of clusters. The proposed method is applied to simulated bivariate normal data, and further extended for use in circular data. Its performance is compared to the criteria discussed in Tibshirani, Walther, and Hastie (2001). The proposed method is not based on any distributional or data assumption which makes it widely applicable to any type of data that can be clustered using at least two clustering algorithms.  相似文献   

20.
Block-Relaxation Approaches for Fitting the INDCLUS Model   总被引:1,自引:1,他引:0  
A well-known clustering model to represent I?×?I?×?J data blocks, the J frontal slices of which consist of I?×?I object by object similarity matrices, is the INDCLUS model. This model implies a grouping of the I objects into a prespecified number of overlapping clusters, with each cluster having a slice-specific positive weight. An INDCLUS model is fitted to a given data set by means of minimizing a least squares loss function. The minimization of this loss function has appeared to be a difficult problem for which several algorithmic strategies have been proposed. At present, the best available option seems to be the SYMPRES algorithm, which minimizes the loss function by means of a block-relaxation algorithm. Yet, SYMPRES is conjectured to suffer from a severe local optima problem. As a way out, based on theoretical results with respect to optimally designing block-relaxation algorithms, five alternative block-relaxation algorithms are proposed. In a simulation study it appears that the alternative algorithms with overlapping parameter subsets perform best and clearly outperform SYMPRES in terms of optimization performance and cluster recovery.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号