共查询到20条相似文献,搜索用时 577 毫秒
1.
Several methods have recently been introduced for investigating relations between three interpoint proximity matricesA, B, C, each of which furnishes a different type of distance between the same objects. Smouse, Long, and Sokal (1986) investigate
the partial correlation betweenA andB conditional onC. Dow and Cheverud (1985) ask whethercorr (A, C), equalscorr (B, C). Manly (1986) investigates regression-like models for predicting one matrix as a function of others.
We have investigated rejection rates of these methods when their null hypotheses are true, but data are spatially autocorrelated
(SA). That is,A, andB are distance matrices from independent realizations of the same SA generating process, andC is a matrix of geographic connections.
SA causes all the models to be liberal because the hypothesis of equally likely row/column permutations invoked, by all these
methods, is untrue when data are SA. Consequently, we cannot unreservedly recommend the use of any of these methods with SA
data. However, if SA is weak, the Smouse-Long-Sokal method, used with a conservative critical value, is unlikely to reject
falsely. 相似文献
2.
Optimal algorithms for comparing trees with labeled leaves 总被引:2,自引:1,他引:1
William H. E. Day 《Journal of Classification》1985,2(1):7-28
LetR
n
denote the set of rooted trees withn leaves in which: the leaves are labeled by the integers in {1, ...,n}; and among interior vertices only the root may have degree two. Associated with each interior vertexv in such a tree is the subset, orcluster, of leaf labels in the subtree rooted atv. Cluster {1, ...,n} is calledtrivial. Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For anyk trees inR
n
, thestrict consensus tree C(T
1, ...,T
k
) is that tree inR
n
containing exactly those clusters common to every one of thek trees. Similarity between treesT
1 andT
2 inR
n
is measured by the numberS(T
1,T
2) of nontrivial clusters in bothT
1 andT
2; dissimilarity, by the numberD(T
1,T
2) of clusters inT
1 orT
2 but not in both. Algorithms are known to computeC(T
1, ...,T
k
) inO(kn
2) time, andS(T
1,T
2) andD(T
1,T
2) inO(n
2) time. I propose a special representation of the clusters of any treeT R
n
, one that permits testing in constant time whether a given cluster exists inT. I describe algorithms that exploit this representation to computeC(T
1, ...,T
k
) inO(kn) time, andS(T
1,T
2) andD(T
1,T
2) inO(n) time. These algorithms are optimal in a technical sense. They enable well-known indices of consensus between two trees to be computed inO(n) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.The Natural Sciences and Engineering Research Council of Canada partially supported this work with grant A-4142. 相似文献
3.
We present an O(n 3)-time, O(n 2)-space algorithm to test whether a dissimilarity d on an n-object set X is Robinsonian, i.e., X admits an ordering such that i≤j≤k implies that d(x i,xk)≥max {d(xi,xj),d(xj,xk)}. 相似文献
4.
Mark de Rooij 《Journal of Classification》2002,19(1):161-178
p similarity function, the L
p
-transform and the Minkowski-p distance. For triadic distance models defined by the L
p
-transform we will prove that they do not model three-way association. Moreover, triadic distance models defined by the L
p
-transform are restricted multiple dyadic distances, where each dyadic distance is defined for a two-way margin of the three-way
table. Distance models for three-way two-mode data, called three-way distance models, do succeed in modeling three-way association. 相似文献
5.
The Metric Cutpoint Partition Problem 总被引:1,自引:1,他引:0
Let G = (V, E,w) be a graph with vertex and edge sets V and E, respectively, and w: E → a function which assigns a positive weight or length to each edge of G. G is called a realization of a finite metric space (M, d), with M = {1, ..., n} if and only if {1, ..., n} ⊆ V and d(i, j) is equal to the length of the shortest chain linking i and j in G ∀i, j = 1, ..., n. A realization G of (M, d), is called optimal if the sum of its weights is minimal among all the realizations of (M, d). A cutpoint in a graph G is a vertex whose removal strictly increases the number of connected components of G. The Metric Cutpoint Partition Problem is to determine if a finite metric space (M, d) has an optimal realization containing a cutpoint. We prove in this paper that this problem is polynomially solvable. We
also describe an algorithm that constructs an optimal realization of (M, d) from optimal realizations of subspaces that do not contain any cutpoint.
Supported by grant PA002-104974/2 from the Swiss National Science Foundation.
Published online xx, xx, xxxx. 相似文献
6.
Sampo V. Paunonen 《Journal of Classification》1984,1(1):125-131
Analytic procedures for classifying objects are commonly based on the product-moment correlation as a measure of object similarity. This statistic, however, generally does not represent an invariant index of similarity between two objects if they are measured along different bipolar variables where the direction of measurement for each variable is arbitrary. A computer simulation study compared Cohen's (1969) proposed solution to the problem, the invariant similarity coefficientr
c
, with the mean product-moment correlation based on all possible changes in the measurement direction of individual variables within a profile of scores. The empirical observation thatr
c
approaches the mean product-moment correlation with increases in the number of scores in the profiles was interpreted as encouragement for the use ofr
c
in classification research. Some cautions regarding its application were noted.This research was supported by the Social Sciences and Humanities Research Council of Canada, Grant no. 410-83-0633, and by the University of Toronto. 相似文献
7.
A dissimilarity D on a finite set S is said to be Robinsonian if S can be totally ordered in such a way that, for every i < j < k, D (i, j) ≤ D (i, k) and D (j, k) ≤ D (i, k). Intuitively, D is Robinsonian if S can be represented by points on a line. Recognizing Robinsonian dissimilarities has many applications in seriation and classification. In this paper, we present an optimal O (n 2) algorithm to recognize Robinsonian dissimilarities, where n is the cardinal of S. Our result improves the already known algorithms. 相似文献
8.
Matthijs J. Warrens 《Journal of Classification》2010,27(2):173-190
We study a family of n-way metrics that generalize the usual two-way metric. The n-way metrics are totally symmetric maps from E
n
into
\mathbbR \geqslant 0 {\mathbb{R}_{ \geqslant 0}} . The three-way metrics introduced by Joly and Le Calvé (1995) and Heiser and Bennani (1997) and the n-way metrics studied in Deza and Rosenberg (2000) belong to this family. It is shown how the n-way metrics and n-way distance measures are related to (n − 1)-way metrics, respectively, (n − 1)-way distance measures. 相似文献
9.
An algorithm to maximize the agreement between partitions 总被引:2,自引:1,他引:1
H. Messatfa 《Journal of Classification》1992,9(1):5-15
10.
Elena Rubei 《Journal of Classification》2016,33(2):282-297
Let \( \mathcal{G} \) = (G,w) be a weighted simple finite connected graph, that is, let G be a simple finite connected graph endowed with a function w from the set of the edges of G to the set of real numbers. For any subgraph G′ of G, we define w(G′) to be the sum of the weights of the edges of G′. For any i, j vertices of G, we define D {i,j}(\( \mathcal{G} \)) to be the minimum of the weights of the simple paths of G joining i and j. The D {i,j}(\( \mathcal{G} \)) are called 2-weights of \( \mathcal{G} \). Weighted graphs and their reconstruction from 2-weights have applications in several disciplines, such as biology and psychology.Let \( {\left\{{m}_I\right\}}_{I\in \left(\frac{\left\{1,\dots, n\right\}}{2}\right)} \) and \( {\left\{{M}_I\right\}}_{I\in \left(\frac{\left\{1,\dots, n\right\}}{2}\right)} \) be two families of positive real numbers parametrized by the 2-subsets of {1, …, n} with m I ≤ M I for any I; we study when there exist a positive-weighted graph G and an n-subset {1, …, n} of the set of its vertices such that D I (\( \mathcal{G} \)) ∈ [m I ,M I ] for any \( I\in \left(\frac{\left\{1,\dots, n\right\}}{2}\right) \). Then we study the analogous problem for trees, both in the case of positive weights and in the case of general weights. 相似文献
11.
In this paper, we establish that the following fitting problem is NP-hard: given a finite set X and a dissimilarity measure d on X (d is a symmetric function from X
2
to the nonnegative real numbers and vanishing on the diagonal), we wish to find a Robinsonian dissimilarity d
R
on X minimizing the l
∞
-error ||d − d
R
||
∞
= maxx,y
∈X{|d(x, y) − d
R
(x, y)|} between d and d
R
. Recall that a dissimilarity d
R
on X is called monotone (or Robinsonian) if there exists a total order ≺ on X such that x ≺ z ≺ y implies that d(x, y) ≥ max{d(x, z), d(z, y)}. The Robinsonian dissimilarities appear in seriation and clustering problems, in sparse matrix ordering and DNA sequencing. 相似文献
12.
Thaddeus Tarpey 《Journal of Classification》1998,15(1):57-79
The set of k points that optimally represent a distribution in terms of mean squared error have been called principal points (Flury 1990). Principal points are a special case of self-consistent points. Any given set of k distinct points in R
p
induce a partition of R
p
into Voronoi regions or domains of attraction according to minimal distance. A set of k points are called self-consistent for a distribution if each point equals the conditional mean of the distribution over its respective Voronoi region. For
symmetric multivariate distributions, sets of self-consistent points typically form symmetric patterns. This paper investigates
the optimality of different symmetric patterns of self-consistent points for symmetric multivariate distributions and in particular
for the bivariate normal distribution. These results are applied to the problem of estimating principal points. 相似文献
13.
On some significance tests in cluster analysis 总被引:1,自引:1,他引:0
Bock H.H. 《Journal of Classification》1985,2(1):77-108
We investigate the properties of several significance tests for distinguishing between the hypothesisH of a homogeneous population and an alternativeA involving clustering or heterogeneity, with emphasis on the case of multidimensional observationsx
1, ...,x
n
p
. Four types of test statistics are considered: the (s-th) largest gap between observations, their mean distance (or similarity), the minimum within-cluster sum of squares resulting from a k-means algorithm, and the resulting maximum F statistic. The asymptotic distributions underH are given forn and the asymptotic power of the tests is derived for neighboring alternatives. 相似文献
14.
In this paper, we propose a bicriterion objective function for clustering a given set ofN entities, which minimizes [d–(1–)s], where 01, andd ands are the diameter and the split of the clustering, respectively. When =1, the problem reduces to minimum diameter clustering, and when =0, maximum split clustering. We show that this objective provides an effective way to compromise between the two often conflicting criteria. While the problem is NP-hard in general, a polynomial algorithm with the worst-case time complexityO(N
2) is devised to solve the bipartition version. This algorithm actually gives all the Pareto optimal bipartitions with respect to diameter and split, and it can be extended to yield an efficient divisive hierarchical scheme. An extension of the approach to the objective [(d
1+d
2)–2(1–)s] is also proposed, whered
1 andd
2 are diameters of the two clusters of a bipartition.This research was supported in part by the National Science and Engineering Research Council of Canada (Grant OGP 0104900). The authors wish to thank two anonymous referees, whose detailed comments on earlier drafts improved the paper. 相似文献
15.
Gianluigi Oliveri 《Foundations of Science》2006,11(1-2):41-79
The present paper aims at showing that there are times when set theoretical knowledge increases in a non-cumulative way. In other words, what we call ‘set theory’ is not one theory which grows by simple addition of a theorem after the other, but a finite sequence of theories T 1, ..., T n in which T i+1, for 1 ≤ i < n, supersedes T i . This thesis has a great philosophical significance because it implies that there is a sense in which mathematical theories, like the theories belonging to the empirical sciences, are fallible and that, consequently, mathematical knowledge has a quasi-empirical nature. The way I have chosen to provide evidence in favour of the correctness of the main thesis of this article consists in arguing that Cantor–Zermelo set theory is a Lakatosian Mathematical Research Programme (MRP). 相似文献
16.
Tudor B. Ionescu Géraldine Polaillon Frédéric Boulanger 《Journal of Classification》2010,27(2):136-157
We present a new distance based quartet method for phylogenetic tree reconstruction, called Minimum Tree Cost Quartet Puzzling.
Starting from a distance matrix computed from natural data, the algorithm incrementally constructs a tree by adding one taxon
at a time to the intermediary tree using a cost function based on the relaxed 4-point condition for weighting quartets. Different
input orders of taxa lead to trees having distinct topologies which can be evaluated using a maximum likelihood or weighted
least squares optimality criterion. Using reduced sets of quartets and a simple heuristic tree search strategy we obtain an
overall complexity of O(n
5 log2
n) for the algorithm. We evaluate the performances of the method through comparative tests and show that our method outperforms
NJ when a weighted least squares optimality criterion is employed. We also discuss the theoretical boundaries of the algorithm. 相似文献
17.
Assouad has shown that a real-valued distance d = (dij)1 ≤ i < j ≤ n is isometrically embeddable in ℓ1space if and only if it belongs to the cut cone on n points. Determining if this condition holds is NP-complete. We use Assouad's
result in a constructive column generation algorithm for ℓ1-embeddability. The subproblem is an unconstrained 0-1 quadratic program, solved by Tabu Search and Variable Neighborhood
Search heuristics as well as by an exact enumerative algorithm. Computational results are reported. Several ways to approximate
a distance which is not ℓ1-embeddable by another one which is are also studied. 相似文献
18.
T.J. Smith 《Journal of Classification》2000,17(2):225-242
1 optimization under linear inequality constraints based upon iteratively reweighted iterative projection (or IRIP). IRIP is
compared to a linear programming (LP) strategy for L1 minimization (Sp?th 1987, Chapter 5.3) using the ultrametric condition as an exemlar class of constraints to be fitted. Coded
for general constraints, the LP approach proves to be faster. Both methods, however, suffer from a serious limitation in being
unable to process reasonably-sized data sets because of storage requirements for the constraints. When the simplicity of vector
projections is used to allow IRIP to be coded for specific (in this case, ultrametric) constraints, we obtain a fast and efficient
algorithm capable of handling large data sets. It is also possible to extend IRIP to operate as a heuristic search strategy
that simultaneously identifies both a reasonable set of constraints to impose and the optimally-estimated parameters satisfying these constraints. A few noteworthy characteristics of L1 optimal ultrametrics are discussed, including other strategies for reformulating the ultrametric optimization problem. 相似文献
19.
Nedret Billor Asheber Abebe Asuman Turkmen Sai V. Nudurupati 《Journal of Classification》2008,25(2):249-260
Suppose y, a d-dimensional (d ≥ 1) vector, is drawn from a mixture of k (k ≥ 2) populations, given by ∏1, ∏2,…,∏
k
. We wish to identify the population that is the most likely source of the point y. To solve this classification problem many classification rules have been proposed in the literature. In this study, a new
nonparametric classifier based on the transvariation probabilities of data depth is proposed. We compare the performance of
the newly proposed nonparametric classifier with classical and maximum depth classifiers using some benchmark and simulated
data sets.
The authors thank the editor and referees for comments that led to an improvement of this paper. This work is partially supported
by the National Science Foundation under Grant No. DMS-0604726.
Published online xx, xx, xxxx. 相似文献
20.
Maurizio Vichi 《Journal of Classification》1999,16(1):27-44
X is
the automatic hierarchical classification of one mode (units or variables or
occasions) of X on the basis of the other two. In this paper the
case of OMC of units according to variables and occasions is discussed. OMC is
the synthesis of a set of hierarchical classifications Delta obtained from
X; e.g., the OMC of units is the consensus (synthesis) among the set
of dendograms individually defined by clustering units on the basis of
variables, separately for each given occasion of X. However,
because Delta is often formed by a large number of classifications, it may be
unrealistic that a single synthesis is representative of the entire set. In
this case, subsets of similar (homegeneous) dendograms may be found in Delta
so that a consensus representative of each subset may be identified. This
paper proposes, PARtition and Least Squares Consensus cLassifications Analysis (PARLSCLA) of a set of
r hierarchical classifications Delta. PARLSCLA identifies the best
least-squares partition of Delta into m (1 <= m <= r)
subsets of homogeneous dendograms and simultaneously detects the closest
consensus classification (a median classification called Least Squares
Consensus Dendogram (LSCD) for each subset. PARLSCLA is a generalization of the
problem to find a least-squares consensus dendogram for Delta. PARLSCLA is
formalized as a mixed-integer programming problem and solved with an iterative,
two-step algorithm. The method proposed is applied to an empirical data set. 相似文献