共查询到20条相似文献,搜索用时 31 毫秒
1.
Thomas J. Smith 《Journal of Classification》2001,18(2):185-207
L
1) criterion. Examples of ultrametric and additive trees fitted to two extant data sets are given, plus a Monte Carlo analysis
to assess the impact of both typical data error and extreme values on fitted trees. Solutions are compared to the least-squares
(L
2) approach of Hubert and Arabie (1995a), with results indicating that (with these data) the L
1 and L
2 optimization strategies perform very similarly. A number of observations are made concerning possible uses of an L
1 approach, the nature and number of identified locally optimal solutions, and metric recovery differences between ultrametrics
and additive trees. 相似文献
2.
NP-hard Approximation Problems in Overlapping Clustering 总被引:1,自引:1,他引:0
Lp
-norm (p < ∞). These problems also correspond to the approximation by a strongly Robinson dissimilarity or by a dissimilarity fulfilling
the four-point inequality (Bandelt 1992; Diatta and Fichet 1994). The results are extended to circular strongly Robinson dissimilarities,
indexed k-hierarchies (Jardine and Sibson 1971, pp. 65-71), and to proper dissimilarities satisfying the Bertrand and Janowitz (k + 2)-point inequality (Bertrand and Janowitz 1999). Unidimensional scaling (linear or circular) is reinterpreted as a clustering
problem and its hardness is established, but only for the L
1 norm. 相似文献
3.
Classical unidimensional scaling provides a difficult combinatorial task. A procedure formulated as a nonlinear programming
(NLP) model is proposed to solve this problem. The new method can be implemented with standard mathematical programming software.
Unlike the traditional procedures that minimize either the sum of squared error (L
2 norm) or the sum pf absolute error (L
1 norm), the proposed method can minimize the error based on any L
p
norm for 1 ≤p < ∞. Extensions of the NLP formulation to address a multidimensional scaling problem under the city-block model are also
discussed. 相似文献
4.
Michael W. Trosset 《Journal of Classification》1998,15(1):15-35
A natural extension of classical metric multidimensional scaling is proposed. The result is a new formulation of nonmetric
multidimensional scaling in which the strain criterion is minimized subject to order constraints on the disparity variables.
Innovative features of the new formulation include: the parametrization of the p-dimensional distance matrices by the positive semidefinite matrices of rank ≤p; optimization of the (squared) disparity variables, rather than the configuration coordinate variables; and a new nondegeneracy
constraint, which restricts the set of (squared) disparities rather than the set of distances. Solutions are obtained using
an easily implemented gradient projection method for numerical optimization. The method is applied to two published data
sets. 相似文献
5.
Clique optimization: A method to construct parsimonious ultrametric trees from similarity data 总被引:1,自引:1,他引:0
N. Sriram 《Journal of Classification》1990,7(1):33-52
6.
Optimal Variable Weighting for Ultrametric and Additive Trees and K-means Partitioning: Methods and Software 总被引:1,自引:0,他引:1
K -means partitioning. We also describe some new features and improvements to the algorithm proposed by De Soete. Monte Carlo
simulations have been conducted using different error conditions. In all cases (i.e., ultrametric or additive trees, or K-means partitioning), the simulation results indicate that the optimal weighting procedure should be used for analyzing data
containing noisy variables that do not contribute relevant information to the classification structure. However, if the data
involve error-perturbed variables that are relevant to the classification or outliers, it seems better to cluster or partition
the entities by using variables with equal weights. A new computer program, OVW, which is available to researchers as freeware,
implements improved algorithms for optimal variable weighting for ultrametric and additive tree clustering, and includes a
new algorithm for optimal variable weighting for K-means partitioning. 相似文献
7.
Mark de Rooij 《Journal of Classification》2002,19(1):161-178
p similarity function, the L
p
-transform and the Minkowski-p distance. For triadic distance models defined by the L
p
-transform we will prove that they do not model three-way association. Moreover, triadic distance models defined by the L
p
-transform are restricted multiple dyadic distances, where each dyadic distance is defined for a two-way margin of the three-way
table. Distance models for three-way two-mode data, called three-way distance models, do succeed in modeling three-way association. 相似文献
8.
Tudor B. Ionescu Géraldine Polaillon Frédéric Boulanger 《Journal of Classification》2010,27(2):136-157
We present a new distance based quartet method for phylogenetic tree reconstruction, called Minimum Tree Cost Quartet Puzzling.
Starting from a distance matrix computed from natural data, the algorithm incrementally constructs a tree by adding one taxon
at a time to the intermediary tree using a cost function based on the relaxed 4-point condition for weighting quartets. Different
input orders of taxa lead to trees having distinct topologies which can be evaluated using a maximum likelihood or weighted
least squares optimality criterion. Using reduced sets of quartets and a simple heuristic tree search strategy we obtain an
overall complexity of O(n
5 log2
n) for the algorithm. We evaluate the performances of the method through comparative tests and show that our method outperforms
NJ when a weighted least squares optimality criterion is employed. We also discuss the theoretical boundaries of the algorithm. 相似文献
9.
Within the non-iterative procedures for performing a correspondence analysis with linear constraints, a strategy is proposed
to impose linear constraints in analyzing a contingency table with one or two ordered sets of categories. At the heart of
the approach is the partition of the Pearson chi-squared statistics which involves terms that summarize the association between
the nominal/ordinal variables using bivariate moments based on orthogonal polynomials. Linear constraints are then included
directly in suitable matrices reflecting the most important components, overcoming also the problem of imposing linear constraints
based on subjective decisions. 相似文献
10.
Michael J. Brusco 《Journal of Classification》2002,19(1):45-67
L
1-norm are also presented. I conclude that the
computational scaling problems depends largely on the criterion of interest,
with unidimensional scaling problems depends largely on the criterion of
interest, with unidimensional scaling in the L
1-norm being
especially challenging. 相似文献
11.
Givenk rooted binary treesA
1, A2, ..., Ak, with labeled leaves, we generateC, a unique system of lineage constraints on common ancestors. We then present an algorithm for constructing the set of rooted binary treesB, compatible with all ofA
1, A2, ..., Ak. The running time to obtain one such supertree isO(k
2 n2), wheren is the number of distinct leaves in all of the treesA
1, A2, ..., Ak. 相似文献
12.
The representation of three-way proximity data by single and multiple tree structure models 总被引:4,自引:4,他引:0
Models for the representation of proximity data (similarities/dissimilarities) can be categorized into one of three groups of models: continuous spatial models, discrete nonspatial models, and hybrid models (which combine aspects of both spatial and discrete models). Multidimensional scaling models and associated methods, used for thespatial representation of such proximity data, have been devised to accommodate two, three, and higher-way arrays. At least one model/method for overlapping (but generally non-hierarchical) clustering called INDCLUS (Carroll and Arabie 1983) has been devised for the case of three-way arrays of proximity data. Tree-fitting methods, used for thediscrete network representation of such proximity data, have only thus far been devised to handle two-way arrays. This paper develops a new methodology called INDTREES (for INdividual Differences in TREE Structures) for fitting various(discrete) tree structures to three-way proximity data. This individual differences generalization is one in which different individuals, for example, are assumed to base their judgments on the same family of trees, but are allowed to have different node heights and/or branch lengths.We initially present an introductory overview focussing on existing two-way models. The INDTREES model and algorithm are then described in detail. Monte Carlo results for the INDTREES fitting of four different three-way data sets are presented. In the application, a single ultrametric tree is fitted to three-way proximity data derived from intention-to-buy-data for various brands of over-the-counter pain relievers for relieving three common types of maladies. Finally, we briefly describe how the INDTREES procedure can be extended to accommodate hybrid modelling, as well as to handle other types of applications. 相似文献
13.
Ryszard Wójcicki 《Foundations of Science》1995,1(4):471-516
This paper was written with two aims in mind. A large part of it is just an exposition of Tarski's theory of truth. Philosophers do not agree on how Tarski's theory is related to their investigations. Some of them doubt whether that theory has any relevance to philosophical issues and in particular whether it can be applied in dealing with the problems of philosophy (theory) of science.In this paper I argue that Tarski's chief concern was the following question. Suppose a language L belongs to the class of languages for which, in full accordance with some formal conditions set in advance, we are able to define the class of all the semantic interpretations the language may acquire. Every interpretation of L can be viewed as a certain structure to which the expressions of the language may refer. Suppose that a specific interpretation of the language L was singled out as the intended one. Suppose, moreover, that the intended interpretation can be characterized in a metalanguage L
+. If the above assumptions are satisfied, can the notion of truth for L be defined in the metalanguage L
+ and, if it can, how can this be done? 相似文献
14.
Nedret Billor Asheber Abebe Asuman Turkmen Sai V. Nudurupati 《Journal of Classification》2008,25(2):249-260
Suppose y, a d-dimensional (d ≥ 1) vector, is drawn from a mixture of k (k ≥ 2) populations, given by ∏1, ∏2,…,∏
k
. We wish to identify the population that is the most likely source of the point y. To solve this classification problem many classification rules have been proposed in the literature. In this study, a new
nonparametric classifier based on the transvariation probabilities of data depth is proposed. We compare the performance of
the newly proposed nonparametric classifier with classical and maximum depth classifiers using some benchmark and simulated
data sets.
The authors thank the editor and referees for comments that led to an improvement of this paper. This work is partially supported
by the National Science Foundation under Grant No. DMS-0604726.
Published online xx, xx, xxxx. 相似文献
15.
L2
-norm: (1)
dynamic programming; (2) an iterative quadratic assignment improvement
heuristic; (3) the Guttman update strategy as modified by Pliner's technique
of smoothing; (4) a nonlinear programming reformulation by Lau, Leung, and
Tse. The methods are all implemented through (freely downloadable) MATLAB
m-files; their use is illustrated by a common data set carried throughout. For
the computationally intensive dynamic programming formulation that can a
globally optimal solution, several possible computational improvements are
discussed and evaluated using (a) a transformation of a given m-function with
the MATLAB Compiler into C code and compiling the latter; (b) rewriting an
m-function and a mandatory MATLAB gateway directly in Fortran and compiling
into a MATLAB callable file; (c) comparisons of the acceleration of raw
m-files implemented under the most recent release of MATLAB Version 6.5 (and compared to the absence of such
acceleration under the previous MATLAB Version 6.1). Finally, and in contrast
to the combinatorial optimization task of identifying a best unidimensional
scaling for a given proximity matrix, an approach is given for the
confirmatory fitting of a given unidimensional scaling based only on a fixed
object ordering, and to nonmetric unidensional scaling that incorporates an
additional optimal monotonic transformation of the proximities. 相似文献
16.
Let G = (V,E,w) be a graph with vertex and edge sets V and E, respectively, and w:E → R + a function which assigns a positive weight or length to each edge of G. G is called a realization of a finite metric space
(M,d), with M = { 1,...,n} if and only if { 1,...,n} ⫅ V and d(i,j) is equal to the length of the shortest chain linking i
and j in G ∀ i,j = 1,...,n. A realization G of (M,d), is said optimal if the sum of its weights is minimal among all the realizations
of (M,d). Consider a partition of M into two nonempty subsets K and L, and let e be an edge in a realization G of (M,d); we
say that e is a bridge linking K with L if e belongs to all chains in G linking a vertex of K with a vertex of L. The Metric
Bridge Partition Problem is to determine if the elements of a finite metric space (M,d) can be partitioned into two nonempty
subsets K and L such that all optimal realizations of (M,d) contain a bridge linking K with L. We prove in this paper that
this problem is polynomially solvable. We also describe an algorithm that constructs an optimal realization of (M,d) from
optimal realizations of (K,d|K) and (L,d|L). 相似文献
17.
Maurizio Vichi 《Journal of Classification》1999,16(1):27-44
X is
the automatic hierarchical classification of one mode (units or variables or
occasions) of X on the basis of the other two. In this paper the
case of OMC of units according to variables and occasions is discussed. OMC is
the synthesis of a set of hierarchical classifications Delta obtained from
X; e.g., the OMC of units is the consensus (synthesis) among the set
of dendograms individually defined by clustering units on the basis of
variables, separately for each given occasion of X. However,
because Delta is often formed by a large number of classifications, it may be
unrealistic that a single synthesis is representative of the entire set. In
this case, subsets of similar (homegeneous) dendograms may be found in Delta
so that a consensus representative of each subset may be identified. This
paper proposes, PARtition and Least Squares Consensus cLassifications Analysis (PARLSCLA) of a set of
r hierarchical classifications Delta. PARLSCLA identifies the best
least-squares partition of Delta into m (1 <= m <= r)
subsets of homogeneous dendograms and simultaneously detects the closest
consensus classification (a median classification called Least Squares
Consensus Dendogram (LSCD) for each subset. PARLSCLA is a generalization of the
problem to find a least-squares consensus dendogram for Delta. PARLSCLA is
formalized as a mixed-integer programming problem and solved with an iterative,
two-step algorithm. The method proposed is applied to an empirical data set. 相似文献
18.
Marek Ancukiewicz 《Journal of Classification》1998,15(1):129-141
I consider a new problem of classification into n(n ≥ 2) disjoint classes based on features of unclassified data. It is assumed that the data are grouped into m(M ≥ n) disjoint sets and within each set the distribution of features is a mixture of distributions corresponding to particular
classes. Moreover, the mixing proportions should be known and form a matrix of rank n. The idea of solution is, first, to estimate feature densities in all the groups, then to solve the linear system for component
densities. The proposed classification method is asymptotically optimal, provided a consistent method of density estimation
is used. For illustration, the method is applied to determining perfusion status in myocardial infarction patients, using
creatine kinase measurements. 相似文献
19.
Least squares algorithms for constructing constrained ultrametric and additive tree representations of symmetric proximity data 总被引:1,自引:1,他引:0
A mathematical programming algorithm is developed for fitting ultrametric or additive trees to proximity data where external constraints are imposed on the topology of the tree. The two procedures minimize a least squares loss function. The method is illustrated on both synthetic and real data. A constrained ultrametric tree analysis was performed on similarities between 32 subjects based on preferences for ten odors, while a constrained additive tree analysis was carried out on some proximity data between kinship terms. Finally, some extensions of the methodology to other tree fitting procedures are mentioned.The first author is supported as Bevoegdverklaard Navorser of the Belgian Nationaal Fonds voor Wetenschappelijk Onderzoek. 相似文献
20.
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partitioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this. 相似文献