首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
A mathematical programming approach to fitting general graphs   总被引:1,自引:1,他引:0  
We present an algorithm for fitting general graphs to proximity data. The algorithm utilizes a mathematical programming procedure based on a penalty function approach to impose additivity constraints upon parameters. For a user-specified number of links, the algorithm seeks to provide the connected network that gives the least-squares approximation to the proximity data with the specified number of links, allowing for linear transformations of the data. The network distance is the minimum-path-length metric for connected graphs. As a limiting case, the algorithm provides a tree where each node corresponds to an object, if the number of links is set equal to the number of objects minus one. A Monte Carlo investigation indicates that the resulting networks tend to fall within one percentage point of the least-squares solution in terms of the variance accounted for, but do not always attain this global optimum. The network model is discussed in relation to ordinal network representations (Klauer 1989) and NETSCAL (Hutchinson 1989), and applied to several well-known data sets.  相似文献   

2.
L2 -norm: (1) dynamic programming; (2) an iterative quadratic assignment improvement heuristic; (3) the Guttman update strategy as modified by Pliner's technique of smoothing; (4) a nonlinear programming reformulation by Lau, Leung, and Tse. The methods are all implemented through (freely downloadable) MATLAB m-files; their use is illustrated by a common data set carried throughout. For the computationally intensive dynamic programming formulation that can a globally optimal solution, several possible computational improvements are discussed and evaluated using (a) a transformation of a given m-function with the MATLAB Compiler into C code and compiling the latter; (b) rewriting an m-function and a mandatory MATLAB gateway directly in Fortran and compiling into a MATLAB callable file; (c) comparisons of the acceleration of raw m-files implemented under the most recent release of MATLAB Version 6.5 (and compared to the absence of such acceleration under the previous MATLAB Version 6.1). Finally, and in contrast to the combinatorial optimization task of identifying a best unidimensional scaling for a given proximity matrix, an approach is given for the confirmatory fitting of a given unidimensional scaling based only on a fixed object ordering, and to nonmetric unidensional scaling that incorporates an additional optimal monotonic transformation of the proximities.  相似文献   

3.
A mathematical programming algorithm is developed for fitting ultrametric or additive trees to proximity data where external constraints are imposed on the topology of the tree. The two procedures minimize a least squares loss function. The method is illustrated on both synthetic and real data. A constrained ultrametric tree analysis was performed on similarities between 32 subjects based on preferences for ten odors, while a constrained additive tree analysis was carried out on some proximity data between kinship terms. Finally, some extensions of the methodology to other tree fitting procedures are mentioned.The first author is supported as Bevoegdverklaard Navorser of the Belgian Nationaal Fonds voor Wetenschappelijk Onderzoek.  相似文献   

4.
Models for the representation of proximity data (similarities/dissimilarities) can be categorized into one of three groups of models: continuous spatial models, discrete nonspatial models, and hybrid models (which combine aspects of both spatial and discrete models). Multidimensional scaling models and associated methods, used for thespatial representation of such proximity data, have been devised to accommodate two, three, and higher-way arrays. At least one model/method for overlapping (but generally non-hierarchical) clustering called INDCLUS (Carroll and Arabie 1983) has been devised for the case of three-way arrays of proximity data. Tree-fitting methods, used for thediscrete network representation of such proximity data, have only thus far been devised to handle two-way arrays. This paper develops a new methodology called INDTREES (for INdividual Differences in TREE Structures) for fitting various(discrete) tree structures to three-way proximity data. This individual differences generalization is one in which different individuals, for example, are assumed to base their judgments on the same family of trees, but are allowed to have different node heights and/or branch lengths.We initially present an introductory overview focussing on existing two-way models. The INDTREES model and algorithm are then described in detail. Monte Carlo results for the INDTREES fitting of four different three-way data sets are presented. In the application, a single ultrametric tree is fitted to three-way proximity data derived from intention-to-buy-data for various brands of over-the-counter pain relievers for relieving three common types of maladies. Finally, we briefly describe how the INDTREES procedure can be extended to accommodate hybrid modelling, as well as to handle other types of applications.  相似文献   

5.
Many problems entail the analysis of data that are independent and identically distributed random graphs. Useful inference requires flexible probability models for such random graphs; these models should have interpretable location and scale parameters, and support the establishment of confidence regions, maximum likelihood estimates, goodness-of-fit tests, Bayesian inference, and an appropriate analogue of linear model theory. Banks and Carley (1994) develop a simple probability model and sketch some analyses; this paper extends that work so that analysts are able to choose models that reflect application-specific metrics on the set of graphs. The strategy applies to graphs, directed graphs, hypergraphs, and trees, and often extends to objects in countable metric spaces.  相似文献   

6.
A latent class vector model for preference ratings   总被引:1,自引:1,他引:1  
A latent class formulation of the well-known vector model for preference data is presented. Assuming preference ratings as input data, the model simultaneously clusters the subjects into a small number of homogeneous groups (or latent classes) and constructs a joint geometric representation of the choice objects and the latent classes according to a vector model. The distributional assumptions on which the latent class approach is based are analogous to the distributional assumptions that are consistent with the common practice of fitting the vector model to preference data by least squares methods. An EM algorithm for fitting the latent class vector model is described as well as a procedure for selecting the appropriate number of classes and the appropriate number of dimensions. Some illustrative applications of the latent class vector model are presented and some possible extensions are discussed. Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijk Onderzoek.”  相似文献   

7.
ADditive CLUStering (ADCLUS) is a tool for overlapping clustering of two-way proximity matrices (objects?×?objects). In Simple Additive Fuzzy Clustering (SAFC), a variant of ADCLUS is introduced providing a fuzzy partition of the objects, that is the objects belong to the clusters with the so-called membership degrees ranging from zero (complete non-membership) to one (complete membership). INDCLUS (INdividual Differences CLUStering) is a generalization of ADCLUS for handling three-way proximity arrays (objects?×?objects?×?subjects). Here, we propose a fuzzified alternative to INDCLUS capable to offer a fuzzy partition of the objects by generalizing in a three-way context the idea behind SAFC. This new model is called Fuzzy INdividual Differences CLUStering (FINDCLUS). An algorithm is provided for fitting the FINDCLUS model to the data. Finally, the results of a simulation experiment and some applications to synthetic and real data are discussed.  相似文献   

8.
This paper presents a general approach for fitting the ADCLUS (Shepard and Arabie 1979; Arabie, Carroll, DeSarbo, and Wind 1981), INDCLUS (Carroll and Arabie 1983), and potentially a special case of the GENNCLUS (DeSarbo 1982) models. The proposed approach, based largely on a separability property observed for the least squares loss function being optimized, offers increased efficiency and other advantages over existing approaches like MAPCLUS (Arabie and Carroll 1980) for fitting the ADCLUS model, and the INDCLUS method for fitting the INDCLUS model. The new procedure (called SINDCLUS) is applied to three sets of empirical data to demonstrate the effectiveness of the SINDCLUS methodology. Finally, some potentially useful extensions are discussed.  相似文献   

9.
Improvements to the dynamic programming (DP) strategy for partitioning (nonhierarchical classification) as discussed in Hubert, Arabie, and Meulman (2001) are proposed. First, it is shown how the number of evaluations in the DP process can be decreased without affecting generality. Both a completely nonredundant and a quasi-nonredundant method are proposed. Second, an efficient implementation of both approaches is discussed. This implementation is shown to have a dramatic increase in speed over the original program. The flexibility of the approach is illustrated by analyzing three data sets.  相似文献   

10.
A sequential fitting procedure for linear data analysis models   总被引:1,自引:1,他引:0  
A particular factor analysis model with parameter constraints is generalized to include classification problems definable within a framework of fitting linear models. The sequential fitting (SEFIT) approach of principal component analysis is extended to include several nonstandard data analysis and classification tasks. SEFIT methods attempt to explain the variability in the initial data (commonly defined by a sum of squares) through an additive decomposition attributable to the various terms in the model. New methods are developed for both traditional and fuzzy clustering that have useful theoretic and computational properties (principal cluster analysis, additive clustering, and so on). Connections to several known classification strategies are also stated.The author is grateful to P. Arabie and L. J. Hubert for editorial assistance and reviewing going well beyond traditional levels.  相似文献   

11.
Clique optimization (CLOPT) is a family of graph clustering procedures that construct parsimonious ultrametrics by executing a sequence of divisive and agglomerative operations. Every CLOPT procedure is associated with a distinct graph-partitioning heuristic. Seven HCS methods, a mathematical programming algorithm, and two CLOPT heuristics were evaluated on simulated data. These data were obtained by distorting ultrametric partitions and hierarchies. In general, internally optimal models yielded externally optimal models. By recovering near-optimal solutions more consistently, CLOPT2 emerged as the most robust technique.  相似文献   

12.
Classical unidimensional scaling provides a difficult combinatorial task. A procedure formulated as a nonlinear programming (NLP) model is proposed to solve this problem. The new method can be implemented with standard mathematical programming software. Unlike the traditional procedures that minimize either the sum of squared error (L 2 norm) or the sum pf absolute error (L 1 norm), the proposed method can minimize the error based on any L p norm for 1 ≤p < ∞. Extensions of the NLP formulation to address a multidimensional scaling problem under the city-block model are also discussed.  相似文献   

13.
A modified CANDECOMP algorithm is presented for fitting the metric version of the Extended INDSCAL model to three-way proximity data. The Extended INDSCAL model assumes, in addition to the common dimensions, a unique dimension for each object. The modified CANDECOMP algorithm fits the Extended INDSCAL model in a dimension-wise fashion and ensures that the subject weights for the common and the unique dimensions are nonnegative. A Monte Carlo study is reported to illustrate that the method is fairly insensitive to the choice of the initial parameter estimates. A second Monte Carlo study shows that the method is able to recover an underlying Extended INDSCAL structure if present in the data. Finally, the method is applied for illustrative purposes to some empirical data on pain relievers. In the final section, some other possible uses of the new method are discussed. Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijik Onderzoek”.  相似文献   

14.
Data in many different fields come to practitioners through a process naturally described as functional. We propose a classification procedure of oxidation curves. Our algorithm is based on two stages: fitting the functional data by linear splines with free knots and classifying the estimated knots which estimate useful oxidation parameters. A real data set on 57 oxidation curves is used to illustrate our approach.  相似文献   

15.
对关孝和<天文数学杂著>中的天文历算工作进行较为全面的研究,特别指出了"<授时历>求五星定合定积定星校正图解"中的"改正术"与<授时历经>记载大致相同;"日景实测"中的"独特算法"与和算中求圆周率术和求弧背术有密切联系;<元史>中已论及盈缩和迟疾二种不均匀改正,并且<元史>中所述内容包括了关孝和定交日和交定度算法的改正术.图解是关氏历算工作的精华.关孝和亲自观测和校验,以其独特的视角,融合中国古代的数理天文方法,进一步完善了<授时历>的历算工作,有的还有所超越,在日本开辟出新的研究领域.关氏的天文历算工作代表了日本传统历算的最高水平.  相似文献   

16.
Ultrametric tree representations of incomplete dissimilarity data   总被引:2,自引:2,他引:0  
The least squares algorithm for fitting ultrametric trees to proximity data originally proposed by Carroll and Pruzansky and further elaborated by De Soete is extended to handle missing data. A Monte Carlo evaluation reveals that the algorithm is capable of recovering an ultrametric tree underlying an incomplete set of error-perturbed dissimilarities quite well.Geert De Soete is Aangesteld Navorser of the Belgian National Fonds voor Wetenschappelijk Onderzoek.  相似文献   

17.
1 optimization under linear inequality constraints based upon iteratively reweighted iterative projection (or IRIP). IRIP is compared to a linear programming (LP) strategy for L1 minimization (Sp?th 1987, Chapter 5.3) using the ultrametric condition as an exemlar class of constraints to be fitted. Coded for general constraints, the LP approach proves to be faster. Both methods, however, suffer from a serious limitation in being unable to process reasonably-sized data sets because of storage requirements for the constraints. When the simplicity of vector projections is used to allow IRIP to be coded for specific (in this case, ultrametric) constraints, we obtain a fast and efficient algorithm capable of handling large data sets. It is also possible to extend IRIP to operate as a heuristic search strategy that simultaneously identifies both a reasonable set of constraints to impose and the optimally-estimated parameters satisfying these constraints. A few noteworthy characteristics of L1 optimal ultrametrics are discussed, including other strategies for reformulating the ultrametric optimization problem.  相似文献   

18.
An asymmetric multidimensional scaling model and an associated nonmetric algorithm to analyze two-mode three-way proximities (object × object × source) are introduced. The model consists of a common object configuration and two kinds of weights, i.e., for both symmetry and asymmetry. In the common object configuration, each object is represented by a point and a circle (sphere, hypersphere) in a Euclidean space. The common object configuration represents pairwise proximity relationships between pairs of objects for the ‘group’ of all sources. Each source has its own symmetry weight and a set of asymmetry weights. Symmetry weights represent individual differences among sources of data in symmetric proximity relationships, and asymmetry weights represent individual differences among sources in asymmetric proximity relationships. The associated nonmetric algorithm, based on Kruskal’s (1964b) nonmetric multidimensional scaling algorithm, is an extension of the algorithm for the asymmetric multidimensional scaling of one mode two-way proximities developed earlier (Okada and Imaizumi 1987). As an illustrative example, we analyze intergenerational occupational mobility from 1955 to 1985 in Japan among eight occupational categories.  相似文献   

19.
This paper introduces a novel mixture model-based approach to the simultaneous clustering and optimal segmentation of functional data, which are curves presenting regime changes. The proposed model consists of a finite mixture of piecewise polynomial regression models. Each piecewise polynomial regression model is associated with a cluster, and within each cluster, each piecewise polynomial component is associated with a regime (i.e., a segment). We derive two approaches to learning the model parameters: the first is an estimation approach which maximizes the observed-data likelihood via a dedicated expectation-maximization (EM) algorithm, then yielding a fuzzy partition of the curves into K clusters obtained at convergence by maximizing the posterior cluster probabilities. The second is a classification approach and optimizes a specific classification likelihood criterion through a dedicated classification expectation-maximization (CEM) algorithm. The optimal curve segmentation is performed by using dynamic programming. In the classification approach, both the curve clustering and the optimal segmentation are performed simultaneously as the CEM learning proceeds. We show that the classification approach is a probabilistic version generalizing the deterministic K-means-like algorithm proposed in Hébrail, Hugueney, Lechevallier, and Rossi (2010). The proposed approach is evaluated using simulated curves and real-world curves. Comparisons with alternatives including regression mixture models and the K-means-like algorithm for piecewise regression demonstrate the effectiveness of the proposed approach.  相似文献   

20.
Variable Selection for Clustering and Classification   总被引:2,自引:2,他引:0  
As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering algorithms are based upon determining the best variable subspace according to model fitting in a stepwise manner. These techniques are often computationally intensive and can require extended periods of time to run; in fact, some are prohibitively computationally expensive for high-dimensional data. In this paper, a novel variable selection technique is introduced for use in clustering and classification analyses that is both intuitive and computationally efficient. We focus largely on applications in mixture model-based learning, but the technique could be adapted for use with various other clustering/classification methods. Our approach is illustrated on both simulated and real data, highlighted by contrasting its performance with that of other comparable variable selection techniques on the real data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号