首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The paper presents a methodology for classifying three-way dissimilarity data, which are reconstructed by a small number of consensus classifications of the objects each defined by a sum of two order constrained distance matrices, so as to identify both a partition and an indexed hierarchy. Specifically, the dissimilarity matrices are partitioned in homogeneous classes and, within each class, a partition and an indexed hierarchy are simultaneously fitted. The model proposed is mathematically formalized as a constrained mixed-integer quadratic problem to be fitted in the least-squares sense and an alternating least-squares algorithm is proposed which is computationally efficient. Two applications of the methodology are also described together with an extensive simulation to investigate the performance of the algorithm.  相似文献   

2.
O (n 4), where n is the number of objects. We describe the application of the MVR method to two data models: the weighted least-squares (WLS) model (V is diagonal), where the MVR method can be reduced to an O(n 3) time complexity; a model arising from the study of biological sequences, which involves a complex non-diagonal V matrix that is estimated from the dissimilarity matrix Δ. For both models, we provide simulation results that show a significant error reduction in the reconstruction of T, relative to classical agglomerative algorithms.  相似文献   

3.
Suppose that we rank-order the conditional probabilities for a group of subjects that are provided from a Bayesian network (BN) model of binary variables. The conditional probability is the probability that a subject has a certain attribute given an outcome of some other variables and the classification is based on the rank-order. Under the condition that the class sizes are equal across the class levels and that all the variables in the model are positively associated with each other, we compared the classification results between models of binary variables which share the same model structure. In the comparison, we used a BN model, called a similar BN model, which was constructed under some rule based on a set of BN models satisfying certain conditions. Simulation results indicate that the agreement level of the classification between a set of BN models and their corresponding similar BN model is considerably high with the exact agreement for about half of the subjects or more and the agreement up to one-class-level difference for about 90% or more.  相似文献   

4.
A new projection-pursuit index is used to identify clusters and other structures in multivariate data. It is obtained from the variance decompositions of the data’s one-dimensional projections, without assuming a model for the data or that the number of clusters is known. The index is affine invariant and successful with real and simulated data. A general result is obtained indicating that clusters’ separation increases with the data’s dimension. In simulations it is thus confirmed, as expected, that the performance of the index either improves or does not deteriorate when the data’s dimension increases, making it especially useful for “large dimension-small sample size” data. The efficiency of this index will increase with the continuously improved computer technology. Several applications are presented.  相似文献   

5.
When clustering asymmetric proximity data, only the average amounts are often considered by assuming that the asymmetry is due to noise. But when the asymmetry is structural, as typically may happen for exchange flows, migration data or confusion data, this may strongly affect the search for the groups because the directions of the exchanges are ignored and not integrated in the clustering process. The clustering model proposed here relies on the decomposition of the asymmetric dissimilarity matrix into symmetric and skew-symmetric effects both decomposed in within and between cluster effects. The classification structures used here are generally based on two different partitions of the objects fitted to the symmetric and the skew-symmetric part of the data, respectively; the restricted case is also presented where the partition fits jointly both of them allowing for clusters of objects similar with respect to the average amounts and directions of the data. Parsimonious models are presented which allow for effective and simple graphical representations of the results.  相似文献   

6.
7.
X is the automatic hierarchical classification of one mode (units or variables or occasions) of X on the basis of the other two. In this paper the case of OMC of units according to variables and occasions is discussed. OMC is the synthesis of a set of hierarchical classifications Delta obtained from X; e.g., the OMC of units is the consensus (synthesis) among the set of dendograms individually defined by clustering units on the basis of variables, separately for each given occasion of X. However, because Delta is often formed by a large number of classifications, it may be unrealistic that a single synthesis is representative of the entire set. In this case, subsets of similar (homegeneous) dendograms may be found in Delta so that a consensus representative of each subset may be identified. This paper proposes, PARtition and Least Squares Consensus cLassifications Analysis (PARLSCLA) of a set of r hierarchical classifications Delta. PARLSCLA identifies the best least-squares partition of Delta into m (1 <= m <= r) subsets of homogeneous dendograms and simultaneously detects the closest consensus classification (a median classification called Least Squares Consensus Dendogram (LSCD) for each subset. PARLSCLA is a generalization of the problem to find a least-squares consensus dendogram for Delta. PARLSCLA is formalized as a mixed-integer programming problem and solved with an iterative, two-step algorithm. The method proposed is applied to an empirical data set.  相似文献   

8.
9.
The mean-shift algorithm is an iterative method of mode seeking and data clustering based on the kernel density estimator. The blurring mean-shift is an accelerated version which uses the original data only in the first step, then re-smoothes previous estimates. It converges to local centroids, but may suffer from problems of asymptotic bias, which fundamentally depend on the design of its smoothing components. This paper develops nearest-neighbor implementations and data-driven techniques of bandwidth selection, which enhance the clustering performance of the blurring method. These solutions can be applied to the whole class of mean-shift algorithms, including the iterative local mean method. Extended simulation experiments and applications to well known data-sets show the goodness of the blurring estimator with respect to other algorithms.  相似文献   

10.
针对描述CDNA、DNA、氨基酸和蛋白质序列等亲缘关系的3个专用词语——同源性、一致性和相似性在生物类论文中交错使用的问题,对其具体含义和运用进行了分析,以明晰各词语在论文中的准确使用,提高编辑校对质量。  相似文献   

11.
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partitioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.  相似文献   

12.
We consider the problem of combining multiple dissimilarity representations via the Cartesian product of their embeddings. For concreteness, we choose the inferential task at hand to be classification. The high dimensionality of this Cartesian product space implies the necessity of dimensionality reduction before training a classifier. We propose a supervised dimensionality reduction method, which utilizes the class label information, to help achieve a favorable combination. The simulation and real data results show that our approach can improve classification accuracy compared to the alternatives of principal components analysis and no dimensionality reduction at all.  相似文献   

13.
We assemble here properties of certain dissimilarity coefficients and are specially concerned with their metric and Euclidean status. No attempt is made to be exhaustive as far as coefficients are concerned, but certain mathematical results that we have found useful are presented and should help establish similar properties for other coefficients. The response to different types of data is investigated, leading to guidance on the choice of an appropriate coefficient.The authors wish to thank the referees, one of whom did a magnificent job in painstakingly checking the detailed algebra and detecting several slips.  相似文献   

14.
15.
信息技术(InformationTechnology,IT)术语的数量十分庞大,但组合语所占比率很高,由各单词的含义“猜中”组合语的含义的比率更高。因此本文选出1000个基础术语元(素),经对比分析可知:两岸术语元完全相同的约占六成;加上虽有差别却不言自明的术语,“易交流度”约达七成;对于定名不同,但与概念一一对应的术语,经熟悉对方用语的短暂过程后,易交流度升至约八成。而对同名异“实”,会引起严重误导的术语,提出“直译首选”、“压缩多一对应”和“避让既占”三条规则并以此作了优化模拟,假如定名时都遵守这些规则,易交流度几乎达到九成。另根据影响易交流程度的同一量化方案,估计出IT术语在“英语圈”的易交流度为90%~92%。可见“华语圈”要达到这样的水平,绝非不可企及,问题在于协调和优化。  相似文献   

16.
The triangular inequality is a defining property of a metric space, while the stronger ultrametric inequality is a defining property of an ultrametric space. Ultrametric distance is defined from p-adic valuation. It is known that ultrametricity is a natural property of spaces in the sparse limit. The implications of this are discussed in this article. Experimental results are presented which quantify how ultrametric a given metric space is. We explore the practical meaningfulness of this property of a space being ultrametric. In particular, we examine the computational implications of widely prevalent and perhaps ubiquitous ultrametricity.  相似文献   

17.

Reviewers

Guest Reviewers, Journal of Classification Volume 26, 2009  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号