共查询到20条相似文献,搜索用时 93 毫秒
1.
Marek Ancukiewicz 《Journal of Classification》1998,15(1):129-141
I consider a new problem of classification into n(n ≥ 2) disjoint classes based on features of unclassified data. It is assumed that the data are grouped into m(M ≥ n) disjoint sets and within each set the distribution of features is a mixture of distributions corresponding to particular
classes. Moreover, the mixing proportions should be known and form a matrix of rank n. The idea of solution is, first, to estimate feature densities in all the groups, then to solve the linear system for component
densities. The proposed classification method is asymptotically optimal, provided a consistent method of density estimation
is used. For illustration, the method is applied to determining perfusion status in myocardial infarction patients, using
creatine kinase measurements. 相似文献
2.
Optimal Variable Weighting for Ultrametric and Additive Trees and K-means Partitioning: Methods and Software 总被引:1,自引:0,他引:1
K -means partitioning. We also describe some new features and improvements to the algorithm proposed by De Soete. Monte Carlo
simulations have been conducted using different error conditions. In all cases (i.e., ultrametric or additive trees, or K-means partitioning), the simulation results indicate that the optimal weighting procedure should be used for analyzing data
containing noisy variables that do not contribute relevant information to the classification structure. However, if the data
involve error-perturbed variables that are relevant to the classification or outliers, it seems better to cluster or partition
the entities by using variables with equal weights. A new computer program, OVW, which is available to researchers as freeware,
implements improved algorithms for optimal variable weighting for ultrametric and additive tree clustering, and includes a
new algorithm for optimal variable weighting for K-means partitioning. 相似文献
3.
Optimization Strategies for Two-Mode Partitioning 总被引:2,自引:2,他引:0
Joost van Rosmalen Patrick J. F. Groenen Javier Trejos William Castillo 《Journal of Classification》2009,26(2):155-181
Two-mode partitioning is a relatively new form of clustering that clusters both rows and columns of a data matrix. In this
paper, we consider deterministic two-mode partitioning methods in which a criterion similar to k-means is optimized. A variety of optimization methods have been proposed for this type of problem. However, it is still unclear
which method should be used, as various methods may lead to non-global optima. This paper reviews and compares several optimization
methods for two-mode partitioning. Several known methods are discussed, and a new fuzzy steps method is introduced. The fuzzy
steps method is based on the fuzzy c-means algorithm of Bezdek (1981) and the fuzzy steps approach of Heiser and Groenen (1997) and Groenen and Jajuga (2001). The performances of all methods are compared in a large simulation study. In our simulations, a two-mode k-means optimization method most often gives the best results. Finally, an empirical data set is used to give a practical example
of two-mode partitioning.
We would like to thank two anonymous referees whose comments have improved the quality of this paper. We are also grateful
to Peter Verhoef for providing the data set used in this paper. 相似文献
4.
试论清代割圆连比例方法 总被引:1,自引:0,他引:1
割圆连比例曾是清代无穷级数中所用的主要方法,但对它的研究上前还不够充分。该文试图它的概念和原理,并经过分析指出:它是以连比例的递加法为基础,具有一定约束条件的递归过程;它的结构特征取决于所设递加法是其诱导方程;它的初创阶段由于未能运用递加法,其结果归结为迭代过程。 相似文献
5.
Michael P. Windham 《Journal of Classification》1985,2(1):157-172
An approach to numerical classification is described, which treats the assignment of objects to types as a continuous variable, called an assignment measure. Describing a classification by an assignment measure allows one not only to determine the types of objects, but also to see relationships among the objects of the same type and among the types themselves.A classification procedure, the Assignment-Prototype algorithm, is described and evaluated. It is a numerical technique for obtaining assignment measures directly from one-mode, two-way proximity matrices. 相似文献
6.
Maurizio Vichi 《Journal of Classification》1999,16(1):27-44
X is
the automatic hierarchical classification of one mode (units or variables or
occasions) of X on the basis of the other two. In this paper the
case of OMC of units according to variables and occasions is discussed. OMC is
the synthesis of a set of hierarchical classifications Delta obtained from
X; e.g., the OMC of units is the consensus (synthesis) among the set
of dendograms individually defined by clustering units on the basis of
variables, separately for each given occasion of X. However,
because Delta is often formed by a large number of classifications, it may be
unrealistic that a single synthesis is representative of the entire set. In
this case, subsets of similar (homegeneous) dendograms may be found in Delta
so that a consensus representative of each subset may be identified. This
paper proposes, PARtition and Least Squares Consensus cLassifications Analysis (PARLSCLA) of a set of
r hierarchical classifications Delta. PARLSCLA identifies the best
least-squares partition of Delta into m (1 <= m <= r)
subsets of homogeneous dendograms and simultaneously detects the closest
consensus classification (a median classification called Least Squares
Consensus Dendogram (LSCD) for each subset. PARLSCLA is a generalization of the
problem to find a least-squares consensus dendogram for Delta. PARLSCLA is
formalized as a mixed-integer programming problem and solved with an iterative,
two-step algorithm. The method proposed is applied to an empirical data set. 相似文献
7.
Michael P. Windham 《Journal of Classification》1987,4(2):191-214
The more ways there are of understanding a clustering technique, the more effectively the results can be analyzed and used. I will give a general procedure, calledparameter modification, to obtain from a clustering criterion a variety of equivalent forms of the criterion. These alternative forms reveal aspects of the technique that are not necessarily apparent in the original formulation. This procedure is successful in improving the understanding of a significant number of clustering techniques.The insight obtained will be illustrated by applying parameter modification to partitioning, mixture and fuzzy clustering methods, resulting in a unified approach to the study of these methods and a general algorithm for optimizing them.The author wishes to thank Professor Doctor Hans-Hermann Bock for many stimulating discussions. 相似文献
8.
Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm 总被引:4,自引:4,他引:0
This paper presents the development of a new methodology which simultaneously estimates in a least-squares fashion both an ultrametric tree and respective variable weightings for profile data that have been converted into (weighted) Euclidean distances. We first review the relevant classification literature on this topic. The new methodology is presented including the alternating least-squares algorithm used to estimate the parameters. The method is applied to a synthetic data set with known structure as a test of its operation. An application of this new methodology to ethnic group rating data is also discussed. Finally, extensions of the procedure to model additive, multiple, and three-way trees are mentioned.The first author is supported as Bevoegdverklaard Navorser of the Belgian Nationaal Fonds voor Wetenschappelijk Onderzoek. 相似文献
9.
Starting from the problem of missing data in surveys with Likert-type scales, the aim of this paper is to evaluate a possible improvement for the imputation procedure proposed by Lavori, Dawson, and Shera (1995) here called Approximate Bayesian bootstrap with Propensity score (ABP). We propose an imputation procedure named Approximate Bayesian bootstrap with Propensity score and Nearest neighbour (ABPN), which, after the ??propensity score step?? of ABP, randomly selects a donor in the nonrespondent??s neighbourhood, which includes cases with response patterns similar to the one of the nonrespondent to be imputed. A preliminary simulation study with single imputation on missing data in two Likerttype scales from a real data set shows that ABPN: (a) performed better than the ABP imputation, and (b) can be considered as a serious competitor of other procedures used in this context. 相似文献
10.
We present a hierarchical classification based on n-ary relations of the entities. Starting from the finest partition that can be obtained from the attributes, we distinguish between entities having the same attributes by using relations between entities. The classification that we get is thus a refinement of this finest partition. It can be computed in O(n + m 2) space and O(n · p · m 5/2) time, where n is the number of entities, p the number of classes of the resulting hierarchy (p is the size of the output; p < 2n) and m the maximum number of relations an entity can have (usually, m ? n). So we can treat sets with millions of entities. 相似文献
11.
Faicel Chamroukhi 《Journal of Classification》2016,33(3):374-411
This paper introduces a novel mixture model-based approach to the simultaneous clustering and optimal segmentation of functional data, which are curves presenting regime changes. The proposed model consists of a finite mixture of piecewise polynomial regression models. Each piecewise polynomial regression model is associated with a cluster, and within each cluster, each piecewise polynomial component is associated with a regime (i.e., a segment). We derive two approaches to learning the model parameters: the first is an estimation approach which maximizes the observed-data likelihood via a dedicated expectation-maximization (EM) algorithm, then yielding a fuzzy partition of the curves into K clusters obtained at convergence by maximizing the posterior cluster probabilities. The second is a classification approach and optimizes a specific classification likelihood criterion through a dedicated classification expectation-maximization (CEM) algorithm. The optimal curve segmentation is performed by using dynamic programming. In the classification approach, both the curve clustering and the optimal segmentation are performed simultaneously as the CEM learning proceeds. We show that the classification approach is a probabilistic version generalizing the deterministic K-means-like algorithm proposed in Hébrail, Hugueney, Lechevallier, and Rossi (2010). The proposed approach is evaluated using simulated curves and real-world curves. Comparisons with alternatives including regression mixture models and the K-means-like algorithm for piecewise regression demonstrate the effectiveness of the proposed approach. 相似文献
12.
Classical unidimensional scaling provides a difficult combinatorial task. A procedure formulated as a nonlinear programming
(NLP) model is proposed to solve this problem. The new method can be implemented with standard mathematical programming software.
Unlike the traditional procedures that minimize either the sum of squared error (L
2 norm) or the sum pf absolute error (L
1 norm), the proposed method can minimize the error based on any L
p
norm for 1 ≤p < ∞. Extensions of the NLP formulation to address a multidimensional scaling problem under the city-block model are also
discussed. 相似文献
13.
Nedret Billor Asheber Abebe Asuman Turkmen Sai V. Nudurupati 《Journal of Classification》2008,25(2):249-260
Suppose y, a d-dimensional (d ≥ 1) vector, is drawn from a mixture of k (k ≥ 2) populations, given by ∏1, ∏2,…,∏
k
. We wish to identify the population that is the most likely source of the point y. To solve this classification problem many classification rules have been proposed in the literature. In this study, a new
nonparametric classifier based on the transvariation probabilities of data depth is proposed. We compare the performance of
the newly proposed nonparametric classifier with classical and maximum depth classifiers using some benchmark and simulated
data sets.
The authors thank the editor and referees for comments that led to an improvement of this paper. This work is partially supported
by the National Science Foundation under Grant No. DMS-0604726.
Published online xx, xx, xxxx. 相似文献
14.
This research note focuses on a problem where the cluster sizes for two partitions of the same object set are assumed known;
however, the actual assignments of objects to clusters are unknown for one or both partitions. The objective is to find a
contingency table that produces maximum possible agreement between the two partitions, subject to constraints that the row
and column marginal frequencies for the table correspond exactly to the cluster sizes for the partitions. This problem was
described by H. Messatfa (Journal of Classification, 1992, pp. 5–15), who provided a heuristic procedure based on the linear transportation problem. We present an exact solution
procedure using binary integer programming. We demonstrate that our proposed method efficiently obtains optimal solutions
for problems of practical size.
We would like to thank the Editor, Willem Heiser, and an anonymous reviewer for helpful comments that resulted in improvements
of this article. 相似文献
15.
<Emphasis Type="Italic">k</Emphasis>-Adic Similarity Coefficients for Binary (Presence/Absence) Data 总被引:1,自引:1,他引:0
Matthijs J. Warrens 《Journal of Classification》2009,26(2):227-245
k-Adic formulations (for groups of objects of size k) of a variety of 2-adic similarity coefficients (for pairs of objects) for binary (presence/absence) data are presented.
The formulations are not functions of 2-adic similarity coefficients. Instead, the main objective of the the paper is to present
k-adic formulations that reflect certain basic characteristics of, and have a similar interpretation as, their 2-adic versions.
Two major classes are distinguished. The first class is referred to as Bennani-Heiser similarity coefficients, which contains
all coefficients that can be defined using just the matches, the number of attributes that are present and that are absent
in k objects, and the total number of attributes. The coefficients in the second class can be formulated as functions of Dice’s
association indices.
The author thanks Willem Heiser and three anonymous reviewers for their helpful comments and valuable suggestions on earlier
versions of this article. 相似文献
16.
Classification and spatial methods can be used in conjunction to represent the individual information of similar preferences by means of groups. In the context of latent class models and using Simulated Annealing, the cluster-unfolding model for two-way two-mode preference rating data has been shown to be superior to a two-step approach of first deriving the clusters and then unfolding the classes. However, the high computational cost makes the procedure only suitable for small or medium-sized data sets, and the hypothesis of independent and normally distributed preference data may also be too restrictive in many practical situations. Therefore, an alternating least squares procedure is proposed, in which the individuals and the objects are partitioned into clusters, while at the same time the cluster centers are represented by unfolding. An enhanced Simulated Annealing algorithm in the least squares framework is also proposed in order to address the local optimum problem. Real and artificial data sets are analyzed to illustrate the performance of the model. 相似文献
17.
18.
19.
How to Get It. Diagrammatic Reasoning as a Tool of Knowledge Development and its Pragmatic Dimension
Michael H.G. Hoffmann 《Foundations of Science》2004,9(3):285-305
Discussions concerning belief revision, theorydevelopment, and ``creativity' in philosophy andAI, reveal a growing interest in Peirce'sconcept of abduction. Peirce introducedabduction in an attempt to providetheoretical dignity and clarification to thedifficult problem of knowledge generation. Hewrote that ``An Abduction is Originary inrespect to being the only kind of argumentwhich starts a new idea' (Peirce, CP 2.26).These discussions, however, led to considerabledebates about the precise way in which Peirce'sabduction can be used to explain knowledgegeneration (cf. Magnani, 1999; Hoffmann, 1999).The crucial question is that of understandinghow we can get the new elements capableof enlarging our theories. Under thesecircumstances, it might be helpful to step outof the entanglement and reconsider the basis ofthe problem that originally triggered Peirce'sinterest in abduction. This will lead us toanother Peircean concept, that of ``diagrammaticreasoning,' which I discuss here in the contextof his ``pragmatism.' In this way, I hope toreach a better understanding of thecontribution of ``abduction' to the knowledgegeneration process. 相似文献
20.
A latent class vector model for preference ratings 总被引:1,自引:1,他引:1
A latent class formulation of the well-known vector model for preference data is presented. Assuming preference ratings as
input data, the model simultaneously clusters the subjects into a small number of homogeneous groups (or latent classes) and
constructs a joint geometric representation of the choice objects and the latent classes according to a vector model. The
distributional assumptions on which the latent class approach is based are analogous to the distributional assumptions that
are consistent with the common practice of fitting the vector model to preference data by least squares methods. An EM algorithm
for fitting the latent class vector model is described as well as a procedure for selecting the appropriate number of classes
and the appropriate number of dimensions. Some illustrative applications of the latent class vector model are presented and
some possible extensions are discussed.
Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijk Onderzoek.” 相似文献