共查询到20条相似文献,搜索用时 31 毫秒
1.
Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm 总被引:4,自引:4,他引:0
This paper presents the development of a new methodology which simultaneously estimates in a least-squares fashion both an ultrametric tree and respective variable weightings for profile data that have been converted into (weighted) Euclidean distances. We first review the relevant classification literature on this topic. The new methodology is presented including the alternating least-squares algorithm used to estimate the parameters. The method is applied to a synthetic data set with known structure as a test of its operation. An application of this new methodology to ethnic group rating data is also discussed. Finally, extensions of the procedure to model additive, multiple, and three-way trees are mentioned.The first author is supported as Bevoegdverklaard Navorser of the Belgian Nationaal Fonds voor Wetenschappelijk Onderzoek. 相似文献
2.
In quantum computation non classical features such as superposition states and entanglement are used to solve problems in new ways, impossible on classical digital computers.We illustrate by Deutsch algorithm how a quantum computer can use superposition states to outperform any classical computer. We comment on the view of a quantum computer as a massive parallel computer and recall Amdahls law for a classical parallel computer. We argue that the view on quantum computation as a massive parallel computation disregards the presence of entanglement in a general quantum computation and the non classical way in which parallel results are combined to obtain the final output. 相似文献
3.
Geert De Soete 《Journal of Classification》1984,1(1):235-242
The least squares algorithm for fitting ultrametric trees to proximity data originally proposed by Carroll and Pruzansky and further elaborated by De Soete is extended to handle missing data. A Monte Carlo evaluation reveals that the algorithm is capable of recovering an ultrametric tree underlying an incomplete set of error-perturbed dissimilarities quite well.Geert De Soete is Aangesteld Navorser of the Belgian National Fonds voor Wetenschappelijk Onderzoek. 相似文献
4.
The representation of three-way proximity data by single and multiple tree structure models 总被引:4,自引:4,他引:0
Models for the representation of proximity data (similarities/dissimilarities) can be categorized into one of three groups of models: continuous spatial models, discrete nonspatial models, and hybrid models (which combine aspects of both spatial and discrete models). Multidimensional scaling models and associated methods, used for thespatial representation of such proximity data, have been devised to accommodate two, three, and higher-way arrays. At least one model/method for overlapping (but generally non-hierarchical) clustering called INDCLUS (Carroll and Arabie 1983) has been devised for the case of three-way arrays of proximity data. Tree-fitting methods, used for thediscrete network representation of such proximity data, have only thus far been devised to handle two-way arrays. This paper develops a new methodology called INDTREES (for INdividual Differences in TREE Structures) for fitting various(discrete) tree structures to three-way proximity data. This individual differences generalization is one in which different individuals, for example, are assumed to base their judgments on the same family of trees, but are allowed to have different node heights and/or branch lengths.We initially present an introductory overview focussing on existing two-way models. The INDTREES model and algorithm are then described in detail. Monte Carlo results for the INDTREES fitting of four different three-way data sets are presented. In the application, a single ultrametric tree is fitted to three-way proximity data derived from intention-to-buy-data for various brands of over-the-counter pain relievers for relieving three common types of maladies. Finally, we briefly describe how the INDTREES procedure can be extended to accommodate hybrid modelling, as well as to handle other types of applications. 相似文献
5.
Consider N entities to be classified (e.g., geographical areas), a matrix of dissimilarities
between pairs of entities, a graph H with vertices associated with these entities such that the edges join the vertices corresponding to contiguous entities. The split of a
cluster is the smallest dissimilarity between an entity of this cluster and an entity outside of
it. The single-linkage algorithm (ignoring contiguity between entities) provides partitions into M clusters for which the smallest split of the clusters, called split of the partition, is
maximum. We study here the partitioning of the set of entities into M connected clusters
for all M between N - 1 and 2 (i.e., clusters such that the subgraphs of H induced by their
corresponding sets of entities are connected) with maximum split subject to that condition.
We first provide an exact algorithm with a (N2) complexity for the particular case in which H is a tree. This algorithm suggests in turn a first heuristic algorithm for the general problem. Several variants of this heuristic are Also explored. We then present an exact
algorithm for the general case based on iterative determination of cocycles of subtrees and on the solution of auxiliary set covering problems. As solution of the latter problems is
time-consuming for large instances, we provide another heuristic in which the auxiliary
set covering problems are solved approximately. Computational results obtained with the
exact and heuristic algorithms are presented on test problems from the literature. 相似文献
6.
Tom Addis Jan Townsend Addis Dave Billinge David Gooding Bart-Floris Visscher 《Foundations of Science》2008,13(1):5-16
We argue from the Church-Turing thesis (Kleene Mathematical logic. New York: Wiley 1967) that a program can be considered as equivalent to a formal language similar to predicate calculus where predicates can be taken as functions. We can relate such a calculus to Wittgenstein’s first major work, the Tractatus, and use the Tractatus and its theses as a model of the formal classical definition of a computer program. However, Wittgenstein found flaws in his initial great work and he explored these flaws in a new thesis described in his second great work; the Philosophical Investigations. The question we address is “can computer science make the same leap?” We are proposing, because of the flaws identified by Wittgenstein, that computers will never have the possibility of natural communication with people unless they become active participants of human society. The essential difference between formal models used in computing and human communication is that formal models are based upon rational sets whereas people are not so restricted. We introduce irrational sets as a concept that requires the use of an abductive inference system. However, formal models are still considered central to our means of using hypotheses through deduction to make predictions about the world. These formal models are required to continually be updated in response to peoples’ changes in their way of seeing the world. We propose that one mechanism used to keep track of these changes is the Peircian abductive loop. 相似文献
7.
Data holders, such as statistical institutions and financial organizations, have a very serious and demanding task when producing
data for official and public use. It’s about controlling the risk of identity disclosure and protecting sensitive information
when they communicate data-sets among themselves, to governmental agencies and to the public. One of the techniques applied
is that of micro-aggregation. In a Bayesian setting, micro-aggregation can be viewed as the optimal partitioning of the original
data-set based on the minimization of an appropriate measure of discrepancy, or distance, between two posterior distributions,
one of which is conditional on the original data-set and the other conditional on the aggregated data-set. Assuming d-variate
normal data-sets and using several measures of discrepancy, it is shown that the asymptotically optimal equal probability
m-partition of , with m
1/d
∈ , is the convex one which is provided by hypercubes whose sides are formed by hyperplanes perpendicular to the canonical axes,
no matter which discrepancy measure has been used. On the basis of the above result, a method that produces a sub-optimal
partition with a very small computational cost is presented.
Published online xx, xx, xxxx. 相似文献
8.
Jean-Pierre Barthélemy 《Journal of Classification》1988,5(2):229-236
A class of (multiple) consensus methods for n-trees (dendroids, hierarchical classifications) is studied. This class constitutes an extension of the so-called median consensus in the sense that we get two numbersm andm such that: If a clusterX occurs ink n-trees of a profileP, withk m, then it occurs in every consensus n-tree ofP. IfX occurs ink n-trees ofP, withm k <m, then it may, or may not, belong to a consensus n-tree ofP. IfX occurs ink n-trees ofP, withk <m then it cannot occur in any consensus n-tree ofP. If these conditions are satisfied, the multiconsensus function is said to be thresholded by the pair (m,m). Two results are obtained. The first one characterizes the pairs of numbers that can be viewed as thresholds for some consensus function. The second one provides a characterization of thresholded consensus methods. As an application a characterization of the quota rules is provided.
Resume Cet article traite d'une classe de méthodes de consensus (multiples) entre des classifications hiérarchiques. Cette classe est une généralisation du consensus médian dans las mesure oú elle est constituée des méthodes c pour lesquelles il existe deux nombresm etm tels que: Si une classeX appartient ák hiérarchies d'un profilP, aveck m, alorsX appartient á chaque hiérarchie consensus deP. SiX appartient ák hiérarchies deP, avecm k <m, alorsX, peut, ou non, appartenir à une hiérarchie consensus deP. SiX appartient àk hiérarchies deP, aveck <m, alorsX n'appartient á aucune hiérarchie consensus deP. On dit alors que le couple (m,m) est un seuil pour c. Deux résultats sont obtenus. Le premier caractérise les couples de nombres qui sont des seuils de consensus. Le second caractérise les consensus admettant un seuil. Une caractérisation de la régle des quotas est déduite de ce second résultat.相似文献
9.
Designing models of complex phenomena is a difficult task in engineering that can be tackled by composing a number of partial
models to produce a global model of the phenomena. We propose to embed the partial models in software agents and to implement
their composition as a cooperative negotiation between the agents. The resulting multiagent system provides a global model
of a phenomenon. We applied this approach in modelling two complex physiological processes: the heart rate regulation and
the glucose-insulin metabolism. Beyond the effectiveness demonstrated in these two applications, the idea of using models
associated to software agents to give reason of complex phenomena is in accordance with current tendencies in epistemology,
where it is evident an increasing use of computational models for scientific explanation and analysis. Therefore, our approach
has not only a practical, but also a theoretical significance: agents embedding models are a technology suitable both to representing
and to investigating reality.
相似文献
Francesco AmigoniEmail: |
10.
B. G. Mirkin 《Journal of Classification》1987,4(1):7-31
We review methods of qualitative factor analysis (QFA) developed by the author and his collaborators over the last decade and discuss the use of QFA methods for the additive clustering problem. The QFA method includes, first, finding a square Boolean matrix in a fixed set of Boolean matrices with simple structures to approximate a given similarity matrix, and, second, repeating this process again and again using residual similarity matrices. We present convergence properties for three versions of the method, provide cluster interpretations for results obtained from the algorithms, and give formulas for the evaluation of factor shares of the initial similarities variance.I am indebted to Professor P. Arabie and the referees for valuable comments and editing of the text. 相似文献
11.
Five different methods for obtaining a rational initial estimate of the stimulus space in the INDSCAL model were compared using the SINDSCAL program for fitting INDSCAL. The effect of the number of stimuli, the number of subjects, the dimensionality, and the amount of error on the quality and efficiency of the final SINDSCAL solution were investigated in a Monte Carlo study. We found that the quality of the final solution was not affected by the choice of the initialization method, suggesting that SINDSCAL finds a global optimum regardless of the initialization method used. The most efficient procedures were the methods proposed by by de Leeuw and Pruzansky (1978) and by Flury and Gautschi (1986) for the simultaneous diagonalization of several positive definite symmetric matrices, and a method based on linearly constraining the stimulus space using the CANDELINC approach developed by Carroll, Pruzansky, and Kruskal (1980).Geert De Soete is supported as Bevoegdverklaard Navorser of the Belgian Nationaal Fonds voor Wetenschappelijk Onderzoek. The authors gratefully acknowledge the helpful comments and suggestions of the reviewers. 相似文献
12.
Clique optimization: A method to construct parsimonious ultrametric trees from similarity data 总被引:1,自引:1,他引:0
N. Sriram 《Journal of Classification》1990,7(1):33-52
13.
In this paper we describe in some detail a formal computer model of inferential discourse based on a belief system. The key
issue is that a logical model in a computer, based on rational sets, can usefully model a human situation based on irrational
sets. The background of this work is explained elsewhere, as is the issue of rational and irrational sets (Billinge and Addis,
in: Magnani and Dossena (eds.), Computing, philosophy and cognition, 2004; Stepney et al., Journey: Non-classical philosophy—socially
sensitive computing in journeys non-classical computation: A grand challenge for computing research, 2004). The model is based
on the Belief System (Addis and Gooding, Proceedings of the AISB’99 Symposium on Scientific Creativity, 1999) and it provides
a mechanism for choosing queries based on a range of belief. We explain how it provides a way to update the belief based on
query results, thus modelling others’ experience by inference. We also demonstrate that for the same internal experience,
different models can be built for different actors.
相似文献
Tom AddisEmail: |
14.
Syntactic and structural models specify relationships between their constituents but cannot show what outcomes their interaction
would produce over time in the world. Simulation consists in iterating the states of a model, so as to produce behaviour over
a period of simulated time. Iteration enables us to trace the implications and outcomes of inference rules and other assumptions
implemented in the models that make up a theory. We apply this method to experiments which we treat as models of the particular
aspects of reality they are designed to investigate. Scientific experiments are constantly designed and re-designed in the
context of implementation and use. They mediate between theoretical understanding and the practicalities of engaging with
the empirical and social world. In order to model experiments we need to identify and represent features that all experiments
have in common. We treat these features as parameters of a general model of experiment so that by varying these parameters
different types of experiment can be modelled.
相似文献
D. C. GoodingEmail: |
15.
In multivariate discrimination of several normal populations, the optimal classification procedure is based on quadratic discriminant functions. We compare expected error rates of the quadratic classification procedure if the covariance matrices are estimated under the following four models: (i) arbitrary covariance matrices, (ii) common principal components, (iii) proportional covariance matrices, and (iv) identical covariance matrices. Using Monte Carlo simulation to estimate expected error rates, we study the performance of the four discrimination procedures for five different parameter setups corresponding to standard situations that have been used in the literature. The procedures are examined for sample sizes ranging from 10 to 60, and for two to four groups. Our results quantify the extent to which a parsimonious method reduces error rates, and demonstrate that choosing a simple method of discrimination is often beneficial even if the underlying model assumptions are wrong.The authors wish to thank the editor and three referees for their helpful comments on the first draft of this article. M. J. Schmid supported by grants no. 2.724-0.85 and 2.038-0.86 of the Swiss National Science Foundation. 相似文献
16.
An alternating combinatorial optimization approach to fitting the INDCLUS and generalized INDCLUS models 总被引:1,自引:1,他引:0
This paper presents a general approach for fitting the ADCLUS (Shepard and Arabie 1979; Arabie, Carroll, DeSarbo, and Wind 1981), INDCLUS (Carroll and Arabie 1983), and potentially a special case of the GENNCLUS (DeSarbo 1982) models. The proposed approach, based largely on a separability property observed for the least squares loss function being optimized, offers increased efficiency and other advantages over existing approaches like MAPCLUS (Arabie and Carroll 1980) for fitting the ADCLUS model, and the INDCLUS method for fitting the INDCLUS model. The new procedure (called SINDCLUS) is applied to three sets of empirical data to demonstrate the effectiveness of the SINDCLUS methodology. Finally, some potentially useful extensions are discussed. 相似文献
17.
Minimum sum of diameters clustering 总被引:1,自引:1,他引:0
The problem of determining a partition of a given set ofN entities intoM clusters such that the sum of the diameters of these clusters is minimum has been studied by Brucker (1978). He proved that it is NP-complete forM3 and mentioned that its complexity was unknown forM=2. We provide anO(N
3 logN) algorithm for this latter case. Moreover, we show that determining a partition into two clusters which minimizes any given function of the diameters can be done inO(N
5) time.Acknowledgments: This research was supported by the Air Force Office of Scientific Research Grant AFOSR 0271 to Rutgers University. We are grateful to Yves Crama for several insightful remarks and to an anonymous referee for detailed comments. 相似文献
18.
In this paper we want to examine how the mutual understanding of speakers is reached during a conversation through collaborative
processes, and what role is played by abductive inference (in the Peircean sense) in these processes. We do this by bringing
together contributions coming from a variety of disciplines, such as logic, philosophy of language and psychology. When speakers
are engaged in a conversation, they refer to a supposed common ground: every participant ascribes to the others some knowledge,
belief, opinion etc. on which to rely in order to reach mutual understanding. As the conversation unfolds, this common ground
is continually corrected and reshaped by the interchanges. An abductive reasoning takes place, in a collaborative setting,
in order to build new possible theories about the common ground. In reconstructing this process through the use of a working
example, we argue that the integration of a collaborative perspective within the Peircean theory of abduction can help to
solve some of the drawbacks that the critics of the latter have outlined, for example its permissivity and non generativity.
相似文献
Roberta FerrarioEmail: |
19.
This paper proposes a measure of spatial homogeneity for sets of d-dimensional points based on nearest neighbor distances. Tests for spatial uniformity are examined which assess the tendency of the entire data set to aggregate and evaluate the character of individual clusters. The sizes and powers of three statistical tests of uniformity against aggregation, regularity, and unimodality are studied to determine robustness. The paper also studies the effects of normalization and incorrect prior information. A percentile frame sampling procedure is proposed that does not require a sampling window but is superior to a toroidal frame and to buffer zone sampling in particular situations. Examples test two data sets for homogeneity and search the results of a hierarchical clustering for homogeneous clusters.This work was partially supported by NSF Grant ECS-8300204. 相似文献
20.
Least squares algorithms for constructing constrained ultrametric and additive tree representations of symmetric proximity data 总被引:1,自引:1,他引:0
A mathematical programming algorithm is developed for fitting ultrametric or additive trees to proximity data where external constraints are imposed on the topology of the tree. The two procedures minimize a least squares loss function. The method is illustrated on both synthetic and real data. A constrained ultrametric tree analysis was performed on similarities between 32 subjects based on preferences for ten odors, while a constrained additive tree analysis was carried out on some proximity data between kinship terms. Finally, some extensions of the methodology to other tree fitting procedures are mentioned.The first author is supported as Bevoegdverklaard Navorser of the Belgian Nationaal Fonds voor Wetenschappelijk Onderzoek. 相似文献