共查询到20条相似文献,搜索用时 828 毫秒
1.
John T. Daws 《Journal of Classification》1996,13(1):57-80
Free-sorting data are obtained when subjects are given a set of objects and are asked to divide them into subsets. Such data are usually reduced by counting for each pair of objects, how many subjects placed both of them into the same subset. The present study examines the utility of a group of additional statistics. the cooccurrences of sets of three objects. Because there are dependencies among the pair and triple cooccurrences, adjusted triple similarity statistics are developed. Multidimensional scaling and cluster analysis — which usually use pair similarities as their input data — can be modified to operate on three-way similarities to create representations of the set of objects. Such methods are applied to a set of empirical sorting data: Rosenberg and Kim's (1975) fifteen kinship terms.The author thanks Phipps Arabie, Lawrence Hubert, Lawrence Jones, Ed Shoben, and Stanley Wasserman for their considerable contributions to this paper. 相似文献
2.
Probabilistic feature models (PFMs) can be used to explain binary rater judgements about the associations between two types of elements (e.g., objects and attributes) on the basis of binary latent features. In particular, to explain observed object-attribute associations PFMs assume that respondents classify both objects and attributes with respect to a, usually small, number of binary latent features, and that the observed object-attribute association is derived as a specific mapping of these classifications. Standard PFMs assume that the object-attribute association probability is the same according to all respondents, and that all observations are statistically independent. As both assumptions may be unrealistic, a multilevel latent class extension of PFMs is proposed which allows objects and/or attribute parameters to be different across latent rater classes, and which allows to model dependencies between associations with a common object (attribute) by assuming that the link between features and objects (attributes) is fixed across judgements. Formal relationships with existing multilevel latent class models for binary three-way data are described. As an illustration, the models are used to study rater differences in product perception and to investigate individual differences in the situational determinants of anger-related behavior. 相似文献
3.
Global Optimization in Any Minkowski Metric: A Permutation-Translation Simulated Annealing Algorithm for Multidimensional Scaling 总被引:3,自引:1,他引:2
It is well known that considering a non-Euclidean Minkowski metric in Multidimensional Scaling, either for the distance model
or for the loss function, increases the computational problem of local minima considerably. In this paper, we propose an algorithm
in which both the loss function and the composition rule can be considered in any Minkowski metric, using a multivariate randomly
alternating Simulated Annealing procedure with permutation and translation phases. The algorithm has been implemented in Fortran
and tested over classical and simulated data matrices with sizes up to 200 objects. A study has been carried out with some
of the common loss functions to determine the most suitable values for the main parameters. The experimental results confirm
the theoretical expectation that Simulated Annealing is a suitable strategy to deal by itself with the optimization problems
in Multidimensional Scaling, in particular for City-Block, Euclidean and Infinity metrics. 相似文献
4.
Wojtek J. Krzanowski Trevor C. Bailey Derek Partridge Jonathan E. Fieldsend Richard M. Everson Vitaly Schetinin 《Journal of Classification》2006,23(2):199-220
Bayesian classification is currently of considerable interest. It provides a strategy for eliminating the uncertainty associated
with a particular choice of classifiermodel parameters, and is the optimal decision-theoretic choice under certain circumstances
when there is no single “true” classifier for a given data set. Modern computing capabilities can easily support the Markov
chain Monte Carlo sampling that is necessary to carry out the calculations involved, but the information available in these
samples is not at present being fully utilised. We show how it can be allied to known results concerning the “reject option”
in order to produce an assessment of the confidence that can be ascribed to particular classifications, and how these confidence
measures can be used to compare the performances of classifiers. Incorporating these confidence measures can alter the apparent
ranking of classifiers as given by straightforward success or error rates. Several possible methods for obtaining confidence
assessments are described, and compared on a range of data sets using the Bayesian probabilistic nearest-neighbour classifier. 相似文献
5.
W.J. Krzanowski 《Journal of Classification》1998,15(1):81-92
A new set of derived variables is proposed for exhibiting grouped multivariate data in a small number of dimensions, in such
a way as to highlight `extremeness' of one or more groups relative to the rest of the data. Such display can provide a useful
exploratory tool in multivariate ranking and selection problems.
We explore four possible measures of `extremeness', and suggest which one is best for practical application. We show that
the technique can be used to derive either orthogonal or uncorrelated dimensions for any type of input data, and we give an
illustrative example of its use. 相似文献
6.
Ron Wehrens Lutgarde M.C. Buydens Chris Fraley Adrian E. Raftery 《Journal of Classification》2004,21(2):231-253
The rapid increase in the size of data sets makes clustering all the more important
to capture and summarize the information, at the same time making clustering more
difficult to accomplish. If model-based clustering is applied directly to a large data set, it
can be too slow for practical application. A simple and common approach is to first cluster
a random sample of moderate size, and then use the clustering model found in this way
to classify the remainder of the objects. We show that, in its simplest form, this method
may lead to unstable results. Our experiments suggest that a stable method with better performance can be obtained with two straightforward modifications to the simple sampling
method: several tentative models are identified from the sample instead of just one, and
several EM steps are used rather than just one E step to classify the full data set. We find
that there are significant gains from increasing the size of the sample up to about 2,000,
but not from further increases. These conclusions are based on the application of several
alternative strategies to the segmentation of three different multispectral images, and to
several simulated data sets. 相似文献
7.
Many problems entail the analysis of data that are independent and identically distributed random graphs. Useful inference
requires flexible probability models for such random graphs; these models should have interpretable location and scale parameters,
and support the establishment of confidence regions, maximum likelihood estimates, goodness-of-fit tests, Bayesian inference,
and an appropriate analogue of linear model theory. Banks and Carley (1994) develop a simple probability model and sketch
some analyses; this paper extends that work so that analysts are able to choose models that reflect application-specific metrics
on the set of graphs. The strategy applies to graphs, directed graphs, hypergraphs, and trees, and often extends to objects
in countable metric spaces. 相似文献
8.
Carolyn J. Anderson 《Journal of Classification》2013,30(2):276-303
Multiple choice items on tests and Likert items on surveys are ubiquitous in educational, social and behavioral science research; however, methods for analyzing of such data can be problematic. Multidimensional item response theory models are proposed that yield structured Poisson regression models for the joint distribution of responses to items. The methodology presented here extends the approach described in Anderson, Verkuilen, and Peyton (2010) that used fully conditionally specified multinomial logistic regression models as item response functions. In this paper, covariates are added as predictors of the latent variables along with covariates as predictors of location parameters. Furthermore, the models presented here incorporate ordinal information of the response options thus allowing an empirical examination of assumptions regarding the ordering and the estimation of optimal scoring of the response options. To illustrate the methodology and flexibility of the models, data from a study on aggression in middle school (Espelage, Holt, and Henkel 2004) is analyzed. The models are fit to data using SAS. 相似文献
9.
To reveal the structure underlying two-way two-mode object by variable data, Mirkin (1987) has proposed an additive overlapping clustering model. This model implies an overlapping clustering of the objects and a reconstruction of the data, with the reconstructed variable profile of an object being a summation of the variable profiles of the clusters it belongs to. Grasping the additive (overlapping) clustering structure of object by variable data may, however, be seriously hampered in case the data include a very large number of variables. To deal with this problem, we propose a new model that simultaneously clusters the objects in overlapping clusters and reduces the variable space; as such, the model implies that the cluster profiles and, hence, the reconstructed data profiles are constrained to lie in a lowdimensional space. An alternating least squares (ALS) algorithm to fit the new model to a given data set will be presented, along with a simulation study and an illustrative example that makes use of empirical data. 相似文献
10.
Given two or more dendrograms (rooted tree diagrams) based on the same set of objects, ways are presented of defining and obtaining common pruned trees. Bounds on the size of a largest common pruned tree are introduced, as is a categorization of objects according to whether they belong to all, some, or no largest common pruned trees. Also described is a procedure for regrafting pruned branches, yielding trees for which one can assess the reliability of the depicted relationships. The tree obtained by regrafting branches on to a largest common pruned tree is shown to contain all the classes present in the strict consensus tree. The theory is illustrated by application to two classifications of a set of forty-nine stratigraphical pollen spectra.This work was supported by the Science and Engineering Research Council. The authors are grateful to the referees for constructive criticisms of an earlier version of the paper, and to Dr. J.T. Henderson for advice on PASCAL. 相似文献
11.
A mathematical programming approach to fitting general graphs 总被引:1,自引:1,他引:0
We present an algorithm for fitting general graphs to proximity data. The algorithm utilizes a mathematical programming procedure based on a penalty function approach to impose additivity constraints upon parameters. For a user-specified number of links, the algorithm seeks to provide the connected network that gives the least-squares approximation to the proximity data with the specified number of links, allowing for linear transformations of the data. The network distance is the minimum-path-length metric for connected graphs. As a limiting case, the algorithm provides a tree where each node corresponds to an object, if the number of links is set equal to the number of objects minus one. A Monte Carlo investigation indicates that the resulting networks tend to fall within one percentage point of the least-squares solution in terms of the variance accounted for, but do not always attain this global optimum. The network model is discussed in relation to ordinal network representations (Klauer 1989) and NETSCAL (Hutchinson 1989), and applied to several well-known data sets. 相似文献
12.
According to Vázquez and Liz (Found Sci 16(4): 383–391, 2011), Points of View (PoV) can be considered in two different ways. On the one hand, they can be explained following the model of propositional attitudes. This model assumes that the internal structure of a PoV is constituted by a subject, a set of contents, and a set of relations between the subject and those contents. On the other hand, we can analyze points of view taking as a model the notions of location and access. If we choose to follow the second approach, instead of the first one, the internal structure of a PoV is not directly addressed, and the emphasized features of PoV are related to the function that PoV are intended to have. That is, PoV are directly identified by their role and they can solely be understood as ways of accessing the world that bring some kind of perspective about it. Having this in mind, we would like to propose a notation that explains how to understand such access as a sort of models (that can allow the creation of concepts), independently of whether the precise PoV under consideration is impersonal or non-impersonal, its kind of content, and its subjective or objective character. First, we will present an account of some previous approaches to the study of points of view. Then, we will analyze what kind of structure the world is assumed to posses and how the access to it is possible. Third, we will develop a notation that explains PoV as qualitative dimensions by means of which it is possible to valuate objects and states of the world. 相似文献
13.
A. D. Gordon 《Journal of Classification》1990,7(2):257-269
The paper addresses the problem of specifying differential weights for variables in the construction of a measure of dissimilarity. An assessor is required to provide subjective judgments of the pairwise dissimilarities within a training set of objects, and these dissimilarities are then modeled as a function of the recorded differences between the objects on each of the variables. The aim is to make explicit the relative importance that assessors attach to each of the variables, and thus obtain guidance on how these variables should be combined into a relevant dissimilarity matrix. The methodology is illustrated by application to some archaeological data. 相似文献
14.
A mixture likelihood approach for generalized linear models 总被引:6,自引:0,他引:6
A mixture model approach is developed that simultaneously estimates the posterior membership probabilities of observations to a number of unobservable groups or latent classes, and the parameters of a generalized linear model which relates the observations, distributed according to some member of the exponential family, to a set of specified covariates within each Class. We demonstrate how this approach handles many of the existing latent class regression procedures as special cases, as well as a host of other parametric specifications in the exponential family heretofore not mentioned in the latent class literature. As such we generalize the McCullagh and Nelder approach to a latent class framework. The parameters are estimated using maximum likelihood, and an EM algorithm for estimation is provided. A Monte Carlo study of the performance of the algorithm for several distributions is provided, and the model is illustrated in two empirical applications. 相似文献
15.
Multiple imputation is one of the most highly recommended procedures for dealing with missing data. However, to date little attention has been paid to methods for combining the results from principal component analyses applied to a multiply imputed data set. In this paper we propose Generalized Procrustes analysis for this purpose, of which its centroid solution can be used as a final estimate for the component loadings. Convex hulls based on the loadings of the imputed data sets can be used to represent the uncertainty due to the missing data. In two simulation studies, the performance of Generalized Procrustes approach is evaluated and compared with other methods. More specifically it is studied how these methods behave when order changes of components and sign reversals of component loadings occur, such as in case of near-equal eigenvalues, or data having almost as many counterindicative items as indicative items. The simulations show that other proposed methods either may run into serious problems or are not able to adequately assess the accuracy due to the presence of missing data. However, when the above situations do not occur, all methods will provide adequate estimates for the PCA loadings. 相似文献
16.
Jacqueline J. Meulman 《Journal of Classification》1996,13(2):249-266
An approach is presented for analyzing a heterogeneous set of categorical variables assumed to form a limited number of homogeneous subsets. The variables generate a particular set of proximities between the objects in the data matrix, and the objective of the analysis is to represent the objects in lowdimensional Euclidean spaces, where the distances approximate these proximities. A least squares loss function is minimized that involves three major components: a) the partitioning of the heterogeneous variables into homogeneous subsets; b) the optimal quantification of the categories of the variables, and c) the representation of the objects through multiple multidimensional scaling tasks performed simultaneously. An important aspect from an algorithmic point of view is in the use of majorization. The use of the procedure is demonstrated by a typical example of possible application, i.e., the analysis of categorical data obtained in a free-sort task. The results of points of view analysis are contrasted with a standard homogeneity analysis, and the stability is studied through a Jackknife analysis. 相似文献
17.
Multiclass Functional Discriminant Analysis and Its Application to Gesture Recognition 总被引:1,自引:1,他引:0
We consider applying a functional logistic discriminant procedure to the analysis of handwritten character data. Time-course
trajectories corresponding to the X and Y coordinate values of handwritten characters written in the air with one finger are
converted into a functional data set via regularized basis expansion. We then apply functional logistic modeling to classify
the functions into several classes. In order to select the values of adjusted parameters involved in the functional logistic
model, we derive a model selection criterion for evaluating models estimated by the method of regularization. Results indicate
the effectiveness of our modeling strategy in terms of prediction accuracy. 相似文献
18.
George W. Furnas 《Journal of Classification》1984,1(1):187-233
Several techniques are given for the uniform generation of trees for use in Monte Carlo studies of clustering and tree representations. First, general strategies are reviewed for random selection from a set of combinatorial objects with special emphasis on two that use random mapping operations. Theorems are given on how the number of such objects in the set (e.g., whether the number is prime) affects which strategies can be used. Based on these results, methods are presented for the random generation of six types of binary unordered trees. Three types of labeling and both rooted and unrooted forms are considered. Presentation of each method includes the theory of the method, the generation algorithm, an analysis of its computational complexity and comments on the distribution of trees over which it samples. Formal proofs and detailed algorithms are in appendices. 相似文献
19.
20.
Statistical properties of large published classifications 总被引:1,自引:1,他引:0
Eric W. Holman 《Journal of Classification》1992,9(2):187-210
Large published classifications typically consist of sets (called taxa) hierarchically arranged according to taxonomic rank.
A statistical survey of 23 such classification reveals the following distinctive properties. The pattern of mandatory and
optional taxonomic ranks is similar to a Guttman scale. Mean taxon size (defined as the number of next-lower-rank taxa per
higher-rank taxon) is a U-shaped function of mandatory rank, and averages about seven across ranks with no significant differences
between classifications. The variability of taxon size is a decreasing function of mandatory rank. The generality of these
properties across classifications suggests that they are determined by the psychology of the classification process. In contrast,
there are significant differences between classifications in the variability of taxon size and in the prevalence of optional
ranks, both of which are greater in biological than in nonbiological classifications. These differences may reflect the nature
of the materials classified.
This research was supported by a research grant from the UCLA Academic Senate and by computer time from the UCLA Office of
Academic Computing. 相似文献