期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Lowdimensional Additive Overlapping Clustering

Dirk Depril Iven Van Mechelen Tom F. Wilderjans 《Journal of Classification》2012,29(3):297-320

To reveal the structure underlying two-way two-mode object by variable data, Mirkin (1987) has proposed an additive overlapping clustering model. This model implies an overlapping clustering of the objects and a reconstruction of the data, with the reconstructed variable profile of an object being a summation of the variable profiles of the clusters it belongs to. Grasping the additive (overlapping) clustering structure of object by variable data may, however, be seriously hampered in case the data include a very large number of variables. To deal with this problem, we propose a new model that simultaneously clusters the objects in overlapping clusters and reduces the variable space; as such, the model implies that the cluster profiles and, hence, the reconstructed data profiles are constrained to lie in a lowdimensional space. An alternating least squares (ALS) algorithm to fit the new model to a given data set will be presented, along with a simulation study and an illustrative example that makes use of empirical data. 相似文献

2.

K-modes Clustering 总被引：2，自引：0，他引：2

Anil Chaturvedi Paul E. Green J. Douglas Caroll 《Journal of Classification》2001,18(1):35-55

0 norm (defined as the limit of an L_p norm as p approaches zero). In Monte Carlo simulations, both K-modes and the latent class procedures (e.g., Goodman 1974) performed with equal efficiency in recovering a known underlying cluster structure. However, K-modes is an order of magnitude faster than the latent class procedure in speed and suffers from fewer problems of local optima than do the latent class procedures. For data sets involving a large number of categorical variables, latent class procedures become computationally extremly slow and hence infeasible. We conjecture that, although in some cases latent class procedures might perform better than K-modes, it could out-perform latent class procedures in other cases. Hence, we recommend that these two approaches be used as "complementary" procedures in performing cluster analysis. We also present an empirical comparison of K-modes and latent class, where the former method prevails. 相似文献

3.

Model-Based Clustering

Paul D. McNicholas 《Journal of Classification》2016,33(3):331-373

The notion of defining a cluster as a component in a mixture model was put forth by Tiedeman in 1955; since then, the use of mixture models for clustering has grown into an important subfield of classification. Considering the volume of work within this field over the past decade, which seems equal to all of that which went before, a review of work to date is timely. First, the definition of a cluster is discussed and some historical context for model-based clustering is provided. Then, starting with Gaussian mixtures, the evolution of model-based clustering is traced, from the famous paper by Wolfe in 1965 to work that is currently available only in preprint form. This review ends with a look ahead to the next decade or so. 相似文献

4.

CLUSCALE ("CLUstering and multidimensional SCAL[E]ing"): A Three-Way Hybrid Model Incorporating Overlapping Clustering and Multidimensional Scaling Structure

Anil Chaturvedi J. Douglas Carroll 《Journal of Classification》2006,23(2):269-299

Traditional techniques of perceptual mapping hypothesize that stimuli are differentiated in a common perceptual space of quantitative attributes. This paper enhances traditional perceptual mapping techniques such as multidimensional scaling (MDS) which assume only continuously valued dimensions by presenting a model and methodology called CLUSCALE for capturing stimulus differentiation due to perceptions that are qualitative, in addition to quantitative or continuously varying perceptual attributes or dimensions. It provides models and OLS parameter estimation procedures for both a two-way and a three-way version of this general model. Since the two-way version of the model and method has already been discussed by Chaturvedi and Carroll (2000), and a stochastic variant discussed by Navarro and Lee (2003), we shall deal in this paper almost entirely with the three-way version of this model. We recommend the use of the three-way approach over the two-way approach, since the three-way approach both accounts for and takes advantage of the heterogeneity in subjects’ perceptions of stimuli to provide maximal information; i.e., it explicitly deals with individual differences among subjects. 相似文献

5.

Clustering Functional Data 总被引：1，自引：0，他引：1

Thaddeus Tarpey Kimberly K. J. Kinateder 《Journal of Classification》2003,20(1):093-114

相似文献

6.

A Note on K-modes Clustering 总被引：2，自引：0，他引：2

Zhexue Huang Michael K. Ng 《Journal of Classification》2003,20(2):257-261

Recently, Chaturvedi, Green and Carroll (2001) presented a nonparametric approach to deriving clusters from categorical data using a new clustering procedure called K-modes. Huang (1998) proposed the K-modes clustering algorithm. In this note, we demonstrate the equivalence of the two K-modes procedures. 相似文献

7.

Point Clustering via Voting Maximization

Costas Panagiotakis 《Journal of Classification》2015,32(2):212-240

相似文献

8.

Solving Non-Uniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms

Alberto Fernández Sergio Gómez 《Journal of Classification》2008,25(1):43-65

In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance andWilliams’ formula which enables the implementation of the algorithm in a recursive way. The authors thank A. Arenas for discussion and helpful comments. This work was partially supported by DGES of the Spanish Government Project No. FIS2006–13321–C02–02 and by a grant of Universitat Rovira i Virgili. 相似文献

9.

Wavelet-based Fuzzy Clustering of Time Series

Elizabeth Ann Maharaj Pierpaolo D’Urso Don U. A. Galagedera 《Journal of Classification》2010,27(2):231-275

Traditional procedures for clustering time series are based mostly on crisp hierarchical or partitioning methods. Given that the dynamics of a time series may change over time, a time series might display patterns that may enable it to belong to one cluster over one period while over another period, its pattern may be more consistent with those in another cluster. The traditional clustering procedures are unable to identify the changing patterns over time. However, clustering based on fuzzy logic will be able to detect the switching patterns from one time period to another thus enabling some time series to simultaneously belong to more than one cluster. In particular, this paper proposes a fuzzy approach to the clustering of time series based on their variances through wavelet decomposition. We will show that this approach will distinguish between time series with different patterns in variability as well identifying time series with switching patterns in variability. 相似文献

10.

Kernel-Based Methods to Identify Overlapping Clusters with Linear and Nonlinear Boundaries

Chiheb-Eddine Ben N’Cir Nadia Essoussi Mohamed Limam 《Journal of Classification》2015,32(2):176-211

相似文献

11.

Clustering and isolation in the consensus problem for partitions

Dean A Neumann Victor T. Norton Jr 《Journal of Classification》1986,3(2):281-297

We examine the problem of aggregating several partitions of a finite set into a single consensus partition We note that the dual concepts of clustering and isolation are especially significant in this connection. The hypothesis that a consensus partition should respect unanimity with respect to either concept leads us to stress a consensus interval rather than a single partition. The extremes of this interval are characterized axiomatically. If a sufficient totality of traits has been measured, and if measurement errors are independent, then a true classifying partition can be expected to lie in the consensus interval. The structure of the partitions in the interval lends itself to partial solutions of the consensus problem Conditional entropy may be used to quantify the uncertainty inherent in the interval as a whole 相似文献

12.

A Proposal for Robust Curve Clustering

Luis Angel Garcia-Escudero Alfonso Gordaliza 《Journal of Classification》2005,22(2):185-201

Functional data sets appear in many areas of science. Although each data point may be seen as a large finite-dimensional vector it is preferable to think of them as functions, and many classical multivariate techniques have been generalized for this kind of data. A widely used technique for dealing with functional data is to choose a finite-dimensional basis and find the best projection of each curve onto this basis. Therefore, given a functional basis, an approach for doing curve clustering relies on applying the k-means methodology to the fitted basis coefficients corresponding to all the curves in the data set. Unfortunately, a serious drawback follows from the lack of robustness of k-means. Trimmed k-means clustering (Cuesta-Albertos, Gordaliza, and Matran 1997) provides a robust alternative to the use of k-means and, consequently, it may be successfully used in this functional framework. The proposed approach will be exemplified by considering cubic B-splines bases, but other bases can be applied analogously depending on the application at hand. 相似文献

13.

The Isolation Approach to Hierarchical Clustering

Hans-Rolf Gregorius 《Journal of Classification》2004,21(1):51-69

相似文献

14.

On the Complexity of Ordinal Clustering

Rahul Shah Martin Farach-Colton 《Journal of Classification》2006,23(1):79-102

Given a set of pairwise distances on a set of n points, constructing an edgeweighted tree whose leaves are these n points such that the tree distances would mimic the original distances under some criteria is a fundamental problem. One such criterion is to preserve the ordinal relation between the pairwise distances. The ordinal relation can be of the form of total order on the distances or it can be some partial order specified on the pairwise distances. We show that the problem of finding a weighted tree, if it exists, which would preserve the total order on pairwise distances is NP-hard. We also show the NP-hardness of the problem of finding a weighted tree which would preserve a particular kind of partial order called a triangle order, one of the most fundamental partial orders considered in computational biology. 相似文献

15.

Performance Analysis of Hierarchical Clustering Algorithms

Olivier Gascuel Andy McKenzie 《Journal of Classification》2004,21(1):3-18

相似文献

16.

Variable Selection for Clustering and Classification

Jeffrey L. Andrews Paul D. McNicholas 《Journal of Classification》2014,31(2):136-153

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering algorithms are based upon determining the best variable subspace according to model fitting in a stepwise manner. These techniques are often computationally intensive and can require extended periods of time to run; in fact, some are prohibitively computationally expensive for high-dimensional data. In this paper, a novel variable selection technique is introduced for use in clustering and classification analyses that is both intuitive and computationally efficient. We focus largely on applications in mixture model-based learning, but the technique could be adapted for use with various other clustering/classification methods. Our approach is illustrated on both simulated and real data, highlighted by contrasting its performance with that of other comparable variable selection techniques on the real data sets. 相似文献

17.

Model-Based Clustering for Conditionally Correlated Categorical Data

Matthieu Marbac Christophe Biernacki Vincent Vandewalle 《Journal of Classification》2015,32(2):145-175

相似文献

18.

Minimum Sum of Squares Clustering in a Low Dimensional Space

Hansen Pierre Jaumard Brigitte Mladenovic Nenad 《Journal of Classification》1998,15(1):37-55

Clustering with a criterion which minimizes the sum of squared distances to cluster centroids is usually done in a heuristic way. An exact polynomial algorithm, with a complexity in O(N ^p+1 logN), is proposed for minimum sum of squares hierarchical divisive clustering of points in a p-dimensional space with small p. Empirical complexity is one order of magnitude lower. Data sets with N = 20000 for p = 2, N = 1000 for p = 3, and N = 200 for p = 4 are clustered in a reasonable computing time. 相似文献

19.

TOBAE: A Density-based Agglomerative Clustering Algorithm

Shehzad Khalid Shahid Razzaq 《Journal of Classification》2015,32(2):241-267

相似文献

20.

An Extraction and Regularization Approach to Additive Clustering

Michael D. Lee 《Journal of Classification》1999,16(2):255-281

相似文献