首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A dissimilarity D on a finite set S is said to be Robinsonian if S can be totally ordered in such a way that, for every i < j < k, D (i, j) ≤ D (i, k) and D (j, k) ≤ D (i, k). Intuitively, D is Robinsonian if S can be represented by points on a line. Recognizing Robinsonian dissimilarities has many applications in seriation and classification. In this paper, we present an optimal O (n 2) algorithm to recognize Robinsonian dissimilarities, where n is the cardinal of S. Our result improves the already known algorithms.  相似文献   

2.
In this paper, we establish that the following fitting problem is NP-hard: given a finite set X and a dissimilarity measure d on X (d is a symmetric function from X 2 to the nonnegative real numbers and vanishing on the diagonal), we wish to find a Robinsonian dissimilarity d R on X minimizing the l -error ||d − d R || = maxx,y ∈X{|d(x, y) − d R (x, y)|} between d and d R . Recall that a dissimilarity d R on X is called monotone (or Robinsonian) if there exists a total order ≺ on X such that xzy implies that d(x, y) ≥ max{d(x, z), d(z, y)}. The Robinsonian dissimilarities appear in seriation and clustering problems, in sparse matrix ordering and DNA sequencing.  相似文献   

3.
The Metric Cutpoint Partition Problem   总被引:1,自引:1,他引:0  
Let G = (V, E,w) be a graph with vertex and edge sets V and E, respectively, and w: E → a function which assigns a positive weight or length to each edge of G. G is called a realization of a finite metric space (M, d), with M = {1, ..., n} if and only if {1, ..., n} ⊆ V and d(i, j) is equal to the length of the shortest chain linking i and j in Gi, j = 1, ..., n. A realization G of (M, d), is called optimal if the sum of its weights is minimal among all the realizations of (M, d). A cutpoint in a graph G is a vertex whose removal strictly increases the number of connected components of G. The Metric Cutpoint Partition Problem is to determine if a finite metric space (M, d) has an optimal realization containing a cutpoint. We prove in this paper that this problem is polynomially solvable. We also describe an algorithm that constructs an optimal realization of (M, d) from optimal realizations of subspaces that do not contain any cutpoint. Supported by grant PA002-104974/2 from the Swiss National Science Foundation. Published online xx, xx, xxxx.  相似文献   

4.
Optimal algorithms for comparing trees with labeled leaves   总被引:2,自引:1,他引:1  
LetR n denote the set of rooted trees withn leaves in which: the leaves are labeled by the integers in {1, ...,n}; and among interior vertices only the root may have degree two. Associated with each interior vertexv in such a tree is the subset, orcluster, of leaf labels in the subtree rooted atv. Cluster {1, ...,n} is calledtrivial. Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For anyk trees inR n , thestrict consensus tree C(T 1, ...,T k ) is that tree inR n containing exactly those clusters common to every one of thek trees. Similarity between treesT 1 andT 2 inR n is measured by the numberS(T 1,T 2) of nontrivial clusters in bothT 1 andT 2; dissimilarity, by the numberD(T 1,T 2) of clusters inT 1 orT 2 but not in both. Algorithms are known to computeC(T 1, ...,T k ) inO(kn 2) time, andS(T 1,T 2) andD(T 1,T 2) inO(n 2) time. I propose a special representation of the clusters of any treeT R n , one that permits testing in constant time whether a given cluster exists inT. I describe algorithms that exploit this representation to computeC(T 1, ...,T k ) inO(kn) time, andS(T 1,T 2) andD(T 1,T 2) inO(n) time. These algorithms are optimal in a technical sense. They enable well-known indices of consensus between two trees to be computed inO(n) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.The Natural Sciences and Engineering Research Council of Canada partially supported this work with grant A-4142.  相似文献   

5.
Let \( \mathcal{G} \) = (G,w) be a weighted simple finite connected graph, that is, let G be a simple finite connected graph endowed with a function w from the set of the edges of G to the set of real numbers. For any subgraph G′ of G, we define w(G′) to be the sum of the weights of the edges of G′. For any i, j vertices of G, we define D {i,j}(\( \mathcal{G} \)) to be the minimum of the weights of the simple paths of G joining i and j. The D {i,j}(\( \mathcal{G} \)) are called 2-weights of \( \mathcal{G} \). Weighted graphs and their reconstruction from 2-weights have applications in several disciplines, such as biology and psychology.Let \( {\left\{{m}_I\right\}}_{I\in \left(\frac{\left\{1,\dots, n\right\}}{2}\right)} \) and \( {\left\{{M}_I\right\}}_{I\in \left(\frac{\left\{1,\dots, n\right\}}{2}\right)} \) be two families of positive real numbers parametrized by the 2-subsets of {1, …, n} with m I M I for any I; we study when there exist a positive-weighted graph G and an n-subset {1, …, n} of the set of its vertices such that D I (\( \mathcal{G} \)) ∈ [m I ,M I ] for any \( I\in \left(\frac{\left\{1,\dots, n\right\}}{2}\right) \). Then we study the analogous problem for trees, both in the case of positive weights and in the case of general weights.  相似文献   

6.
Givenk rooted binary treesA 1, A2, ..., Ak, with labeled leaves, we generateC, a unique system of lineage constraints on common ancestors. We then present an algorithm for constructing the set of rooted binary treesB, compatible with all ofA 1, A2, ..., Ak. The running time to obtain one such supertree isO(k 2 n2), wheren is the number of distinct leaves in all of the treesA 1, A2, ..., Ak.  相似文献   

7.
The median procedure for n-trees   总被引:2,自引:2,他引:0  
Let (X,d) be a metric space The functionM:X k 2 x defined by is the minimum } is called themedian procedure and has been found useful in various applications involving the notion of consensus Here we present axioms that characterizeM whenX is a certain class of trees (hierarchical classifications), andd is the symmetric difference metricWe would like to thank the referees and Editor for helpful comments  相似文献   

8.
A k-dissimilarity D on a finite set X, |X|????k, is a map from the set of size k subsets of X to the real numbers. Such maps naturally arise from edgeweighted trees T with leaf-set X: Given a subset Y of X of size k, D(Y ) is defined to be the total length of the smallest subtree of T with leaf-set Y . In case k?=?2, it is well-known that 2-dissimilarities arising in this way can be characterized by the so-called ??4-point condition??. However, in case k?>?2 Pachter and Speyer (2004) recently posed the following question: Given an arbitrary k-dissimilarity, how do we test whether this map comes from a tree? In this paper, we provide an answer to this question, showing that for k????3 a k-dissimilarity on a set X arises from a tree if and only if its restriction to every 2?k-element subset of X arises from some tree, and that 2?k is the least possible subset size to ensure that this is the case. As a corollary, we show that there exists a polynomial-time algorithm to determine when a k-dissimilarity arises from a tree. We also give a 6-point condition for determining when a 3-dissimilarity arises from a tree, that is similar to the aforementioned 4-point condition.  相似文献   

9.
The present paper aims at showing that there are times when set theoretical knowledge increases in a non-cumulative way. In other words, what we call ‘set theory’ is not one theory which grows by simple addition of a theorem after the other, but a finite sequence of theories T 1, ..., T n in which T i+1, for 1 ≤ i < n, supersedes T i . This thesis has a great philosophical significance because it implies that there is a sense in which mathematical theories, like the theories belonging to the empirical sciences, are fallible and that, consequently, mathematical knowledge has a quasi-empirical nature. The way I have chosen to provide evidence in favour of the correctness of the main thesis of this article consists in arguing that Cantor–Zermelo set theory is a Lakatosian Mathematical Research Programme (MRP).  相似文献   

10.
We present a hierarchical classification based on n-ary relations of the entities. Starting from the finest partition that can be obtained from the attributes, we distinguish between entities having the same attributes by using relations between entities. The classification that we get is thus a refinement of this finest partition. It can be computed in O(n + m 2) space and O(n · p · m 5/2) time, where n is the number of entities, p the number of classes of the resulting hierarchy (p is the size of the output; p < 2n) and m the maximum number of relations an entity can have (usually, m ? n). So we can treat sets with millions of entities.  相似文献   

11.
Proportional link linkage (PLL) clustering methods are a parametric family of monotone invariant agglomerative hierarchical clustering methods. This family includes the single, minimedian, and complete linkage clustering methods as special cases; its members are used in psychological and ecological applications. Since the literature on clustering space distortion is oriented to quantitative input data, we adapt its basic concepts to input data with only ordinal significance and analyze the space distortion properties of PLL methods. To enable PLL methods to be used when the numbern of objects being clustered is large, we describe an efficient PLL algorithm that operates inO(n 2 logn) time andO(n 2) space.This work was partially supported by the Natural Sciences and Engineering Research Council of Canada and by the Austrian Fonds zur Förderung der wissenschaftlichen Forschung.  相似文献   

12.
Graphical displays which show inter-sample distances are important for the interpretation and presentation of multivariate data. Except when the displays are two-dimensional, however, they are often difficult to visualize as a whole. A device, based on multidimensional unfolding, is described for presenting some intrinsically high-dimensional displays in fewer, usually two, dimensions. This goal is achieved by representing each sample by a pair of points, sayR i andr i, so that a theoretical distance between thei-th andj-th samples is represented twice, once by the distance betweenR i andr j and once by the distance betweenR j andr i. Selfdistances betweenR i andr i need not be zero. The mathematical conditions for unfolding to exhibit symmetry are established. Algorithms for finding approximate fits, not constrained to be symmetric, are discussed and some examples are given.  相似文献   

13.
O (n 4), where n is the number of objects. We describe the application of the MVR method to two data models: the weighted least-squares (WLS) model (V is diagonal), where the MVR method can be reduced to an O(n 3) time complexity; a model arising from the study of biological sequences, which involves a complex non-diagonal V matrix that is estimated from the dissimilarity matrix Δ. For both models, we provide simulation results that show a significant error reduction in the reconstruction of T, relative to classical agglomerative algorithms.  相似文献   

14.
Assouad has shown that a real-valued distance d = (dij)1 ≤ i < j ≤ n is isometrically embeddable in ℓ1space if and only if it belongs to the cut cone on n points. Determining if this condition holds is NP-complete. We use Assouad's result in a constructive column generation algorithm for ℓ1-embeddability. The subproblem is an unconstrained 0-1 quadratic program, solved by Tabu Search and Variable Neighborhood Search heuristics as well as by an exact enumerative algorithm. Computational results are reported. Several ways to approximate a distance which is not ℓ1-embeddable by another one which is are also studied.  相似文献   

15.
In this paper, the potentialities of transvariation (Gini, 1959) in measuring the separation between two groups of multivariate observations are explored. With this aim, a modified version of Gini’s notion of multidimensional transvariation is proposed. According to Gini (1959), two groups G1 and G2 are said to transvary on the k-dimensional variable X = (X1,...,Xh,...,Xk) if there exists at least one pair of units, belonging to different groups, such that for h = 1,...,k the sign of the difference between their Xh values is opposite to that of m1h −m2h, where m1h and m2h are the corresponding group mean values of Xh. We introduce a modification that allows us to derive a measure of group separation, which can be profitably used in discriminating between two groups. The performance of the measure is tested through simulation experiments. The results show that the proposed measure is not sensitive to distributional assumptions and highlight its robustness against outliers.  相似文献   

16.
Efficient algorithms for agglomerative hierarchical clustering methods   总被引:11,自引:4,他引:7  
Whenevern objects are characterized by a matrix of pairwise dissimilarities, they may be clustered by any of a number of sequential, agglomerative, hierarchical, nonoverlapping (SAHN) clustering methods. These SAHN clustering methods are defined by a paradigmatic algorithm that usually requires 0(n 3) time, in the worst case, to cluster the objects. An improved algorithm (Anderberg 1973), while still requiring 0(n 3) worst-case time, can reasonably be expected to exhibit 0(n 2) expected behavior. By contrast, we describe a SAHN clustering algorithm that requires 0(n 2 logn) time in the worst case. When SAHN clustering methods exhibit reasonable space distortion properties, further improvements are possible. We adapt a SAHN clustering algorithm, based on the efficient construction of nearest neighbor chains, to obtain a reasonably general SAHN clustering algorithm that requires in the worst case 0(n 2) time and space.Whenevern objects are characterized byk-tuples of real numbers, they may be clustered by any of a family of centroid SAHN clustering methods. These methods are based on a geometric model in which clusters are represented by points ink-dimensional real space and points being agglomerated are replaced by a single (centroid) point. For this model, we have solved a class of special packing problems involving point-symmetric convex objects and have exploited it to design an efficient centroid clustering algorithm. Specifically, we describe a centroid SAHN clustering algorithm that requires 0(n 2) time, in the worst case, for fixedk and for a family of dissimilarity measures including the Manhattan, Euclidean, Chebychev and all other Minkowski metrics.This work was partially supported by the Natural Sciences and Engineering Research Council of Canada and by the Austrian Fonds zur Förderung der wissenschaftlichen Forschung.  相似文献   

17.
Suppose y, a d-dimensional (d ≥ 1) vector, is drawn from a mixture of k (k ≥ 2) populations, given by ∏1, ∏2,…,∏ k . We wish to identify the population that is the most likely source of the point y. To solve this classification problem many classification rules have been proposed in the literature. In this study, a new nonparametric classifier based on the transvariation probabilities of data depth is proposed. We compare the performance of the newly proposed nonparametric classifier with classical and maximum depth classifiers using some benchmark and simulated data sets. The authors thank the editor and referees for comments that led to an improvement of this paper. This work is partially supported by the National Science Foundation under Grant No. DMS-0604726. Published online xx, xx, xxxx.  相似文献   

18.
Two properties of tree metrics are already known in the literature: tree metrics on a setX withn elements have 2n?3 degrees of freedom; a tree metric has Robinson form with regard to its minimum spanning tree (MST), or to any such MST if several of them exist. Starting from these results, we prove that a tree metrict is entirely defined by its restriction to some setB of 2n?3 entries. This set is easily determined from the table oft and includes then?1 entries of an MST. A fast method for the adjustment of a tree metric to any given metricd is then obtained. This method extends to dissimilarities.  相似文献   

19.
Measurements of p variables for n samples are collected into a n×p matrix X, where the samples belong to one of k groups. The group means are separated by Mahalanobis distances. CVA optimally represents the group means of X in an r-dimensional space. This can be done by maximizing a ratio criterion (basically one- dimensional) or, more flexibly, by minimizing a rank-constrained least-squares fitting criterion (which is not confined to being one-dimensional but depends on defining an appropriate Mahalanobis metric). In modern n < p problems, where W is not of full rank, the ratio criterion is shown not to be coherent but the fit criterion, with an attention to associated metrics, readily generalizes. In this context we give a unified generalization of CVA, introducing two metrics, one in the range space of W and the other in the null space of W, that have links with Mahalanobis distance. This generalization is computationally efficient, since it requires only the spectral decomposition of a n×n matrix.  相似文献   

20.
We present a new distance based quartet method for phylogenetic tree reconstruction, called Minimum Tree Cost Quartet Puzzling. Starting from a distance matrix computed from natural data, the algorithm incrementally constructs a tree by adding one taxon at a time to the intermediary tree using a cost function based on the relaxed 4-point condition for weighting quartets. Different input orders of taxa lead to trees having distinct topologies which can be evaluated using a maximum likelihood or weighted least squares optimality criterion. Using reduced sets of quartets and a simple heuristic tree search strategy we obtain an overall complexity of O(n 5 log2 n) for the algorithm. We evaluate the performances of the method through comparative tests and show that our method outperforms NJ when a weighted least squares optimality criterion is employed. We also discuss the theoretical boundaries of the algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号