首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Given a set of objects and a symmetric matrix of dissimilarities between them, Unidimensional Scaling is the problem of finding a representation by locating points on a continuum. Approximating dissimilarities by the absolute value of the difference between coordinates on a line constitutes a serious computational problem. This paper presents an algorithm that implements Simulated Annealing in a new way, via a strategy based on a weighted alternating process that uses permutations and point-wise translations to locate the optimal configuration. Explicit implementation details are given for least squares loss functions and for least absolute deviations. The weighted, alternating process is shown to outperform earlier implementations of Simulated Annealing and other optimization strategies for Unidimensional Scaling in run time efficiency, in solution quality, or in both.  相似文献   

2.
L2 -norm: (1) dynamic programming; (2) an iterative quadratic assignment improvement heuristic; (3) the Guttman update strategy as modified by Pliner's technique of smoothing; (4) a nonlinear programming reformulation by Lau, Leung, and Tse. The methods are all implemented through (freely downloadable) MATLAB m-files; their use is illustrated by a common data set carried throughout. For the computationally intensive dynamic programming formulation that can a globally optimal solution, several possible computational improvements are discussed and evaluated using (a) a transformation of a given m-function with the MATLAB Compiler into C code and compiling the latter; (b) rewriting an m-function and a mandatory MATLAB gateway directly in Fortran and compiling into a MATLAB callable file; (c) comparisons of the acceleration of raw m-files implemented under the most recent release of MATLAB Version 6.5 (and compared to the absence of such acceleration under the previous MATLAB Version 6.1). Finally, and in contrast to the combinatorial optimization task of identifying a best unidimensional scaling for a given proximity matrix, an approach is given for the confirmatory fitting of a given unidimensional scaling based only on a fixed object ordering, and to nonmetric unidensional scaling that incorporates an additional optimal monotonic transformation of the proximities.  相似文献   

3.
1 optimization under linear inequality constraints based upon iteratively reweighted iterative projection (or IRIP). IRIP is compared to a linear programming (LP) strategy for L1 minimization (Sp?th 1987, Chapter 5.3) using the ultrametric condition as an exemlar class of constraints to be fitted. Coded for general constraints, the LP approach proves to be faster. Both methods, however, suffer from a serious limitation in being unable to process reasonably-sized data sets because of storage requirements for the constraints. When the simplicity of vector projections is used to allow IRIP to be coded for specific (in this case, ultrametric) constraints, we obtain a fast and efficient algorithm capable of handling large data sets. It is also possible to extend IRIP to operate as a heuristic search strategy that simultaneously identifies both a reasonable set of constraints to impose and the optimally-estimated parameters satisfying these constraints. A few noteworthy characteristics of L1 optimal ultrametrics are discussed, including other strategies for reformulating the ultrametric optimization problem.  相似文献   

4.
L 1) criterion. Examples of ultrametric and additive trees fitted to two extant data sets are given, plus a Monte Carlo analysis to assess the impact of both typical data error and extreme values on fitted trees. Solutions are compared to the least-squares (L 2) approach of Hubert and Arabie (1995a), with results indicating that (with these data) the L 1 and L 2 optimization strategies perform very similarly. A number of observations are made concerning possible uses of an L 1 approach, the nature and number of identified locally optimal solutions, and metric recovery differences between ultrametrics and additive trees.  相似文献   

5.
A column generation based approach is proposed for solving the cluster-wise regression problem. The proposed strategy relies firstly on several efficient heuristic strategies to insert columns into the restricted master problem. If these heuristics fail to identify an improving column, an exhaustive search is performed starting with incrementally larger ending subsets, all the while iteratively performing heuristic optimization to ensure a proper balance of exact and heuristic optimization. Additionally, observations are sequenced by their dual variables and by their inclusion in joint pair branching rules. The proposed strategy is shown to outperform the best known alternative (BBHSE) when the number of clusters is greater than three. Additionally, the current work further demonstrates and expands the successful use of the new paradigm of using incrementally larger ending subsets to strengthen the lower bounds of a branch and bound search as pioneered by Brusco's Repetitive Branch and Bound Algorithm (RBBA).  相似文献   

6.
A new approach to isotonic agglomerative hierarchical clustering   总被引:1,自引:1,他引:0  
Hierarchical clustering methods must be isotonic for the construction of ultrametric. We present a general strategy to widen the class of isotonic methods implemented by agglomerative algorithms. At each step of the agglomeration we allow one of several admissible pairs to be chosen. Then under mild assumptions an appropriate definition of admissibility guarantees isotony. Moreover we consider the use of the new methods to compute locally optimal ultrametrics. Two examples demonstrate the ability to define new agglomerative methods superior to their traditional competitors.  相似文献   

7.
A view of evolution is presented in this paper (a two paper series), intended as a methodological infrastructure for modeling spatio-cultural systems (the design outline of such a model is presented in paper II). A motivation for the re-articulation of evolution as information dynamics is the phenomenologically discovered prerequisite of embedding a meaning-attributing apparatus in any and all models of spatio-cultural systems. An evolution is construed as the dynamics of a complex system comprised of memory devices, connected in an ordered fashion (not randomly) by information-exchanges. An information-exchange transpires when the recipient system adopts a strategy (a continuum of events) that eventually changes its structure; namely, after the exchange, it contains and conveys different information. These memory devices??sub-systems??are also similarly constructed complex system. Only a part of the information is retained by a system in its physical-memory storage, which eventually loses this function too, when the ability to retrieve a common enough structure is lost. The entire amount of information is a system??s structure of connections (information exchanges); it is contained (apparently stored) dynamically when a system is observed in a temporarily stable state. This temporary permanence??robustness and resilience??is attained dynamically; namely, enabled by changes taking place in its sub-systems and in each of their sub-systems. Therefore, for modeling such a system a multi-layer/multi-scale approach is preferable. It enables the addition and subtraction of an interim scale and the consideration of such a scale as a micro or a macro (thus initiating a maneuver up or down scale) according to an ad-hoc requirement of the model, which imitates an envisaged sub system (called an ??Inner World??), to which a certain range of decisions is relegated. This dynamics is driven (in time) by dynamically originated information growth, as defined by Shannon; i.e. by the fact that each of certain state transitions have occurred sometime in the past of the system just so and not otherwise, and by the fact that they have occurred in a certain order. Therefore, each ??history?? and memory retrieval availability of information is unique, and thus can be used to differentiate meanings. Hence, there cannot be a comprehensive solution to the meaning attribution model-design challenge. However, the observation that at the core of each envisaged complex system, moving in time according to a rounded logic, there is an information manipulating device, operating necessarily according to a Boolean logic, can be copied into the design of a model. This observation enables, therefore, the embedding of a specific, locally fitted, meaning attributing device, which is an information manipulating mechanism (it splices/attaches one segment of information to another??its meaning). However, this is just a framework; the actual solution has still to be found locally??for each subject system. Such a solution is demonstrated for the change in location or layout in the Israeli city in paper II.  相似文献   

8.
A mathematical programming approach to fitting general graphs   总被引:1,自引:1,他引:0  
We present an algorithm for fitting general graphs to proximity data. The algorithm utilizes a mathematical programming procedure based on a penalty function approach to impose additivity constraints upon parameters. For a user-specified number of links, the algorithm seeks to provide the connected network that gives the least-squares approximation to the proximity data with the specified number of links, allowing for linear transformations of the data. The network distance is the minimum-path-length metric for connected graphs. As a limiting case, the algorithm provides a tree where each node corresponds to an object, if the number of links is set equal to the number of objects minus one. A Monte Carlo investigation indicates that the resulting networks tend to fall within one percentage point of the least-squares solution in terms of the variance accounted for, but do not always attain this global optimum. The network model is discussed in relation to ordinal network representations (Klauer 1989) and NETSCAL (Hutchinson 1989), and applied to several well-known data sets.  相似文献   

9.
A Binary Integer Program to Maximize the Agreement Between Partitions   总被引:1,自引:1,他引:0  
This research note focuses on a problem where the cluster sizes for two partitions of the same object set are assumed known; however, the actual assignments of objects to clusters are unknown for one or both partitions. The objective is to find a contingency table that produces maximum possible agreement between the two partitions, subject to constraints that the row and column marginal frequencies for the table correspond exactly to the cluster sizes for the partitions. This problem was described by H. Messatfa (Journal of Classification, 1992, pp. 5–15), who provided a heuristic procedure based on the linear transportation problem. We present an exact solution procedure using binary integer programming. We demonstrate that our proposed method efficiently obtains optimal solutions for problems of practical size. We would like to thank the Editor, Willem Heiser, and an anonymous reviewer for helpful comments that resulted in improvements of this article.  相似文献   

10.
In various data settings, it is necessary to compare observations from disparate data sources. We assume the data is in the dissimilarity representation (P?kalska and Duin, 2005) and investigate a joint embedding method (Priebe et al., 2013) that results in a commensurate representation of disparate dissimilarities. We further assume that there are “matched” observations from different conditions which can be considered to be highly similar, for the sake of inference. The joint embedding results in the joint optimization of fidelity (preservation of within-condition dissimilarities) and commensurability (preservation of between-condition dissimilarities between matched observations). We show that the tradeoff between these two criteria can be made explicit using weighted raw stress as the objective function for multidimensional scaling. In our investigations, we use a weight parameter, w, to control the tradeoff, and choose match detection as the inference task. Our results show weights that are optimal (with respect to the inference task) are different than equal weights for commensurability and fidelity and the proposed weighted embedding scheme provides significant improvements in statistical power.  相似文献   

11.
Finite mixture modeling is a popular statistical technique capable of accounting for various shapes in data. One popular application of mixture models is model-based clustering. This paper considers the problem of clustering regression autoregressive moving average time series. Two novel estimation procedures for the considered framework are developed. The first one yields the conditional maximum likelihood estimates which can be used in cases when the length of times series is substantial. Simple analytical expressions make fast parameter estimation possible. The second method incorporates the Kalman filter and yields the exact maximum likelihood estimates. The procedure for assessing variability in obtained estimates is discussed. We also show that the Bayesian information criterion can be successfully used to choose the optimal number of mixture components and correctly assess time series orders. The performance of the developed methodology is evaluated on simulation studies. An application to the analysis of tree ring data is thoroughly considered. The results are very promising as the proposed approach overcomes the limitations of other methods developed so far.  相似文献   

12.
分类是各门科学认识研究对象、获取和组织知识的一般性方法和手段。文章主要研究术语的类型,从语形、语义、功能三个方面出发,根据不同的术语类型划分依据,尝试建立一个术语类型的基本系统。  相似文献   

13.
本文简述了工业生物技术领域在过程科学方面的研究现状及发展趋势;从“大规模培养细胞群体效应及复杂多相流场测试方法”、“绿色工业生物过程高效全局优化原理”、“生物转化过程优化新策略”、“生化反应过程直接放大新方法”、“工业发酵过程优化新方法”等5个方面介绍了973计划项目“工业生物技术的过程科学基础研究”取得的重点进展和在国际上的地位:并提出了在工业生物技术过程科学研究方面未来的战略方向。  相似文献   

14.
Clique optimization (CLOPT) is a family of graph clustering procedures that construct parsimonious ultrametrics by executing a sequence of divisive and agglomerative operations. Every CLOPT procedure is associated with a distinct graph-partitioning heuristic. Seven HCS methods, a mathematical programming algorithm, and two CLOPT heuristics were evaluated on simulated data. These data were obtained by distorting ultrametric partitions and hierarchies. In general, internally optimal models yielded externally optimal models. By recovering near-optimal solutions more consistently, CLOPT2 emerged as the most robust technique.  相似文献   

15.
This paper introduces a novel mixture model-based approach to the simultaneous clustering and optimal segmentation of functional data, which are curves presenting regime changes. The proposed model consists of a finite mixture of piecewise polynomial regression models. Each piecewise polynomial regression model is associated with a cluster, and within each cluster, each piecewise polynomial component is associated with a regime (i.e., a segment). We derive two approaches to learning the model parameters: the first is an estimation approach which maximizes the observed-data likelihood via a dedicated expectation-maximization (EM) algorithm, then yielding a fuzzy partition of the curves into K clusters obtained at convergence by maximizing the posterior cluster probabilities. The second is a classification approach and optimizes a specific classification likelihood criterion through a dedicated classification expectation-maximization (CEM) algorithm. The optimal curve segmentation is performed by using dynamic programming. In the classification approach, both the curve clustering and the optimal segmentation are performed simultaneously as the CEM learning proceeds. We show that the classification approach is a probabilistic version generalizing the deterministic K-means-like algorithm proposed in Hébrail, Hugueney, Lechevallier, and Rossi (2010). The proposed approach is evaluated using simulated curves and real-world curves. Comparisons with alternatives including regression mixture models and the K-means-like algorithm for piecewise regression demonstrate the effectiveness of the proposed approach.  相似文献   

16.
在中国近百年的社会转型中,大学已由引领者的角色渐次落后为追逐者,其原因主要有:大学的行政化倾向、政府权力的异化、大学之间的非理性竞争和内部的寻租型文化等。中国大学要实现与社会相匹配的或超前性的快速转型,需以制度创新为前提,以知识与技术创新为动力,以解决社会发展的关键问题为己任,以培养大批的创业者为目的。  相似文献   

17.
Bayesian classification is currently of considerable interest. It provides a strategy for eliminating the uncertainty associated with a particular choice of classifiermodel parameters, and is the optimal decision-theoretic choice under certain circumstances when there is no single “true” classifier for a given data set. Modern computing capabilities can easily support the Markov chain Monte Carlo sampling that is necessary to carry out the calculations involved, but the information available in these samples is not at present being fully utilised. We show how it can be allied to known results concerning the “reject option” in order to produce an assessment of the confidence that can be ascribed to particular classifications, and how these confidence measures can be used to compare the performances of classifiers. Incorporating these confidence measures can alter the apparent ranking of classifiers as given by straightforward success or error rates. Several possible methods for obtaining confidence assessments are described, and compared on a range of data sets using the Bayesian probabilistic nearest-neighbour classifier.  相似文献   

18.
Comparing partitions   总被引:80,自引:13,他引:67  
The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence often attributed to Rand (1971), discuss the issue of correcting this index for chance, and note that a recent normalization strategy developed by Morey and Agresti (1984) and adopted by others (e.g., Miligan and Cooper 1985) is based on an incorrect assumption. Then, the general problem of comparing partitions is approached indirectly by assessing the congruence of two proximity matrices using a simple cross-product measure. They are generated from corresponding partitions using various scoring rules. Special cases derivable include traditionally familiar statistics and/or ones tailored to weight certain object pairs differentially. Finally, we propose a measure based on the comparison of object triples having the advantage of a probabilistic interpretation in addition to being corrected for chance (i.e., assuming a constant value under a reasonable null hypothesis) and bounded between ±1.William H.E. Day was Acting Editor for the reviewing of this paper. We are grateful to him, Ove Frank, Charles Lewis, Glenn W. Milligan, Ivo Molenaar, Stanley S. Wasserman, and anonymous referees for helpful suggestions. Lynn Bilger and Tom Sharpe provided competent technical assistance. Partial support of Phipps Arabie's participation in this research was provided by NSF Grant SES 8310866 and ONR Contract N00014-83-K-0733.  相似文献   

19.
The additive biclustering model for two-way two-mode object by variable data implies overlapping clusterings of both the objects and the variables together with a weight for each bicluster (i.e., a pair of an object and a variable cluster). In the data analysis, an additive biclustering model is fitted to given data by means of minimizing a least squares loss function. To this end, two alternating least squares algorithms (ALS) may be used: (1) PENCLUS, and (2) Baier’s ALS approach. However, both algorithms suffer from some inherent limitations, which may hamper their performance. As a way out, based on theoretical results regarding optimally designing ALS algorithms, in this paper a new ALS algorithm will be presented. In a simulation study this algorithm will be shown to outperform the existing ALS approaches.  相似文献   

20.
K-means clustering is arguably the most popular technique for partitioning data. Unfortunately, K-means suffers from the well-known problem of locally optimal solutions. Furthermore, the final partition is dependent upon the initial configuration, making the choice of starting partitions all the more important. This paper evaluates 12 procedures proposed in the literature and provides recommendations for best practices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号