首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 828 毫秒
1.
Free-sorting data are obtained when subjects are given a set of objects and are asked to divide them into subsets. Such data are usually reduced by counting for each pair of objects, how many subjects placed both of them into the same subset. The present study examines the utility of a group of additional statistics. the cooccurrences of sets of three objects. Because there are dependencies among the pair and triple cooccurrences, adjusted triple similarity statistics are developed. Multidimensional scaling and cluster analysis — which usually use pair similarities as their input data — can be modified to operate on three-way similarities to create representations of the set of objects. Such methods are applied to a set of empirical sorting data: Rosenberg and Kim's (1975) fifteen kinship terms.The author thanks Phipps Arabie, Lawrence Hubert, Lawrence Jones, Ed Shoben, and Stanley Wasserman for their considerable contributions to this paper.  相似文献   

2.
Probabilistic feature models (PFMs) can be used to explain binary rater judgements about the associations between two types of elements (e.g., objects and attributes) on the basis of binary latent features. In particular, to explain observed object-attribute associations PFMs assume that respondents classify both objects and attributes with respect to a, usually small, number of binary latent features, and that the observed object-attribute association is derived as a specific mapping of these classifications. Standard PFMs assume that the object-attribute association probability is the same according to all respondents, and that all observations are statistically independent. As both assumptions may be unrealistic, a multilevel latent class extension of PFMs is proposed which allows objects and/or attribute parameters to be different across latent rater classes, and which allows to model dependencies between associations with a common object (attribute) by assuming that the link between features and objects (attributes) is fixed across judgements. Formal relationships with existing multilevel latent class models for binary three-way data are described. As an illustration, the models are used to study rater differences in product perception and to investigate individual differences in the situational determinants of anger-related behavior.  相似文献   

3.
It is well known that considering a non-Euclidean Minkowski metric in Multidimensional Scaling, either for the distance model or for the loss function, increases the computational problem of local minima considerably. In this paper, we propose an algorithm in which both the loss function and the composition rule can be considered in any Minkowski metric, using a multivariate randomly alternating Simulated Annealing procedure with permutation and translation phases. The algorithm has been implemented in Fortran and tested over classical and simulated data matrices with sizes up to 200 objects. A study has been carried out with some of the common loss functions to determine the most suitable values for the main parameters. The experimental results confirm the theoretical expectation that Simulated Annealing is a suitable strategy to deal by itself with the optimization problems in Multidimensional Scaling, in particular for City-Block, Euclidean and Infinity metrics.  相似文献   

4.
Bayesian classification is currently of considerable interest. It provides a strategy for eliminating the uncertainty associated with a particular choice of classifiermodel parameters, and is the optimal decision-theoretic choice under certain circumstances when there is no single “true” classifier for a given data set. Modern computing capabilities can easily support the Markov chain Monte Carlo sampling that is necessary to carry out the calculations involved, but the information available in these samples is not at present being fully utilised. We show how it can be allied to known results concerning the “reject option” in order to produce an assessment of the confidence that can be ascribed to particular classifications, and how these confidence measures can be used to compare the performances of classifiers. Incorporating these confidence measures can alter the apparent ranking of classifiers as given by straightforward success or error rates. Several possible methods for obtaining confidence assessments are described, and compared on a range of data sets using the Bayesian probabilistic nearest-neighbour classifier.  相似文献   

5.
A new set of derived variables is proposed for exhibiting grouped multivariate data in a small number of dimensions, in such a way as to highlight `extremeness' of one or more groups relative to the rest of the data. Such display can provide a useful exploratory tool in multivariate ranking and selection problems. We explore four possible measures of `extremeness', and suggest which one is best for practical application. We show that the technique can be used to derive either orthogonal or uncorrelated dimensions for any type of input data, and we give an illustrative example of its use.  相似文献   

6.
The rapid increase in the size of data sets makes clustering all the more important to capture and summarize the information, at the same time making clustering more difficult to accomplish. If model-based clustering is applied directly to a large data set, it can be too slow for practical application. A simple and common approach is to first cluster a random sample of moderate size, and then use the clustering model found in this way to classify the remainder of the objects. We show that, in its simplest form, this method may lead to unstable results. Our experiments suggest that a stable method with better performance can be obtained with two straightforward modifications to the simple sampling method: several tentative models are identified from the sample instead of just one, and several EM steps are used rather than just one E step to classify the full data set. We find that there are significant gains from increasing the size of the sample up to about 2,000, but not from further increases. These conclusions are based on the application of several alternative strategies to the segmentation of three different multispectral images, and to several simulated data sets.  相似文献   

7.
Many problems entail the analysis of data that are independent and identically distributed random graphs. Useful inference requires flexible probability models for such random graphs; these models should have interpretable location and scale parameters, and support the establishment of confidence regions, maximum likelihood estimates, goodness-of-fit tests, Bayesian inference, and an appropriate analogue of linear model theory. Banks and Carley (1994) develop a simple probability model and sketch some analyses; this paper extends that work so that analysts are able to choose models that reflect application-specific metrics on the set of graphs. The strategy applies to graphs, directed graphs, hypergraphs, and trees, and often extends to objects in countable metric spaces.  相似文献   

8.
Multiple choice items on tests and Likert items on surveys are ubiquitous in educational, social and behavioral science research; however, methods for analyzing of such data can be problematic. Multidimensional item response theory models are proposed that yield structured Poisson regression models for the joint distribution of responses to items. The methodology presented here extends the approach described in Anderson, Verkuilen, and Peyton (2010) that used fully conditionally specified multinomial logistic regression models as item response functions. In this paper, covariates are added as predictors of the latent variables along with covariates as predictors of location parameters. Furthermore, the models presented here incorporate ordinal information of the response options thus allowing an empirical examination of assumptions regarding the ordering and the estimation of optimal scoring of the response options. To illustrate the methodology and flexibility of the models, data from a study on aggression in middle school (Espelage, Holt, and Henkel 2004) is analyzed. The models are fit to data using SAS.  相似文献   

9.
To reveal the structure underlying two-way two-mode object by variable data, Mirkin (1987) has proposed an additive overlapping clustering model. This model implies an overlapping clustering of the objects and a reconstruction of the data, with the reconstructed variable profile of an object being a summation of the variable profiles of the clusters it belongs to. Grasping the additive (overlapping) clustering structure of object by variable data may, however, be seriously hampered in case the data include a very large number of variables. To deal with this problem, we propose a new model that simultaneously clusters the objects in overlapping clusters and reduces the variable space; as such, the model implies that the cluster profiles and, hence, the reconstructed data profiles are constrained to lie in a lowdimensional space. An alternating least squares (ALS) algorithm to fit the new model to a given data set will be presented, along with a simulation study and an illustrative example that makes use of empirical data.  相似文献   

10.
Given two or more dendrograms (rooted tree diagrams) based on the same set of objects, ways are presented of defining and obtaining common pruned trees. Bounds on the size of a largest common pruned tree are introduced, as is a categorization of objects according to whether they belong to all, some, or no largest common pruned trees. Also described is a procedure for regrafting pruned branches, yielding trees for which one can assess the reliability of the depicted relationships. The tree obtained by regrafting branches on to a largest common pruned tree is shown to contain all the classes present in the strict consensus tree. The theory is illustrated by application to two classifications of a set of forty-nine stratigraphical pollen spectra.This work was supported by the Science and Engineering Research Council. The authors are grateful to the referees for constructive criticisms of an earlier version of the paper, and to Dr. J.T. Henderson for advice on PASCAL.  相似文献   

11.
A mathematical programming approach to fitting general graphs   总被引:1,自引:1,他引:0  
We present an algorithm for fitting general graphs to proximity data. The algorithm utilizes a mathematical programming procedure based on a penalty function approach to impose additivity constraints upon parameters. For a user-specified number of links, the algorithm seeks to provide the connected network that gives the least-squares approximation to the proximity data with the specified number of links, allowing for linear transformations of the data. The network distance is the minimum-path-length metric for connected graphs. As a limiting case, the algorithm provides a tree where each node corresponds to an object, if the number of links is set equal to the number of objects minus one. A Monte Carlo investigation indicates that the resulting networks tend to fall within one percentage point of the least-squares solution in terms of the variance accounted for, but do not always attain this global optimum. The network model is discussed in relation to ordinal network representations (Klauer 1989) and NETSCAL (Hutchinson 1989), and applied to several well-known data sets.  相似文献   

12.
According to Vázquez and Liz (Found Sci 16(4): 383–391, 2011), Points of View (PoV) can be considered in two different ways. On the one hand, they can be explained following the model of propositional attitudes. This model assumes that the internal structure of a PoV is constituted by a subject, a set of contents, and a set of relations between the subject and those contents. On the other hand, we can analyze points of view taking as a model the notions of location and access. If we choose to follow the second approach, instead of the first one, the internal structure of a PoV is not directly addressed, and the emphasized features of PoV are related to the function that PoV are intended to have. That is, PoV are directly identified by their role and they can solely be understood as ways of accessing the world that bring some kind of perspective about it. Having this in mind, we would like to propose a notation that explains how to understand such access as a sort of models (that can allow the creation of concepts), independently of whether the precise PoV under consideration is impersonal or non-impersonal, its kind of content, and its subjective or objective character. First, we will present an account of some previous approaches to the study of points of view. Then, we will analyze what kind of structure the world is assumed to posses and how the access to it is possible. Third, we will develop a notation that explains PoV as qualitative dimensions by means of which it is possible to valuate objects and states of the world.  相似文献   

13.
The paper addresses the problem of specifying differential weights for variables in the construction of a measure of dissimilarity. An assessor is required to provide subjective judgments of the pairwise dissimilarities within a training set of objects, and these dissimilarities are then modeled as a function of the recorded differences between the objects on each of the variables. The aim is to make explicit the relative importance that assessors attach to each of the variables, and thus obtain guidance on how these variables should be combined into a relevant dissimilarity matrix. The methodology is illustrated by application to some archaeological data.  相似文献   

14.
A mixture likelihood approach for generalized linear models   总被引:6,自引:0,他引:6  
A mixture model approach is developed that simultaneously estimates the posterior membership probabilities of observations to a number of unobservable groups or latent classes, and the parameters of a generalized linear model which relates the observations, distributed according to some member of the exponential family, to a set of specified covariates within each Class. We demonstrate how this approach handles many of the existing latent class regression procedures as special cases, as well as a host of other parametric specifications in the exponential family heretofore not mentioned in the latent class literature. As such we generalize the McCullagh and Nelder approach to a latent class framework. The parameters are estimated using maximum likelihood, and an EM algorithm for estimation is provided. A Monte Carlo study of the performance of the algorithm for several distributions is provided, and the model is illustrated in two empirical applications.  相似文献   

15.
Multiple imputation is one of the most highly recommended procedures for dealing with missing data. However, to date little attention has been paid to methods for combining the results from principal component analyses applied to a multiply imputed data set. In this paper we propose Generalized Procrustes analysis for this purpose, of which its centroid solution can be used as a final estimate for the component loadings. Convex hulls based on the loadings of the imputed data sets can be used to represent the uncertainty due to the missing data. In two simulation studies, the performance of Generalized Procrustes approach is evaluated and compared with other methods. More specifically it is studied how these methods behave when order changes of components and sign reversals of component loadings occur, such as in case of near-equal eigenvalues, or data having almost as many counterindicative items as indicative items. The simulations show that other proposed methods either may run into serious problems or are not able to adequately assess the accuracy due to the presence of missing data. However, when the above situations do not occur, all methods will provide adequate estimates for the PCA loadings.  相似文献   

16.
An approach is presented for analyzing a heterogeneous set of categorical variables assumed to form a limited number of homogeneous subsets. The variables generate a particular set of proximities between the objects in the data matrix, and the objective of the analysis is to represent the objects in lowdimensional Euclidean spaces, where the distances approximate these proximities. A least squares loss function is minimized that involves three major components: a) the partitioning of the heterogeneous variables into homogeneous subsets; b) the optimal quantification of the categories of the variables, and c) the representation of the objects through multiple multidimensional scaling tasks performed simultaneously. An important aspect from an algorithmic point of view is in the use of majorization. The use of the procedure is demonstrated by a typical example of possible application, i.e., the analysis of categorical data obtained in a free-sort task. The results of points of view analysis are contrasted with a standard homogeneity analysis, and the stability is studied through a Jackknife analysis.  相似文献   

17.
We consider applying a functional logistic discriminant procedure to the analysis of handwritten character data. Time-course trajectories corresponding to the X and Y coordinate values of handwritten characters written in the air with one finger are converted into a functional data set via regularized basis expansion. We then apply functional logistic modeling to classify the functions into several classes. In order to select the values of adjusted parameters involved in the functional logistic model, we derive a model selection criterion for evaluating models estimated by the method of regularization. Results indicate the effectiveness of our modeling strategy in terms of prediction accuracy.  相似文献   

18.
Several techniques are given for the uniform generation of trees for use in Monte Carlo studies of clustering and tree representations. First, general strategies are reviewed for random selection from a set of combinatorial objects with special emphasis on two that use random mapping operations. Theorems are given on how the number of such objects in the set (e.g., whether the number is prime) affects which strategies can be used. Based on these results, methods are presented for the random generation of six types of binary unordered trees. Three types of labeling and both rooted and unrooted forms are considered. Presentation of each method includes the theory of the method, the generation algorithm, an analysis of its computational complexity and comments on the distribution of trees over which it samples. Formal proofs and detailed algorithms are in appendices.  相似文献   

19.
依据中国科技信息研究所公布的统计数据,本文对2005—2007年天津市国际、国内科技论文的总数、学科分布、被引用情况、基金与资助论文数、天津高校科技论文的总数及在全国的排名等进行了统计与对比分析,以反映天津市科技论文的现状和水平。  相似文献   

20.
Statistical properties of large published classifications   总被引:1,自引:1,他引:0  
Large published classifications typically consist of sets (called taxa) hierarchically arranged according to taxonomic rank. A statistical survey of 23 such classification reveals the following distinctive properties. The pattern of mandatory and optional taxonomic ranks is similar to a Guttman scale. Mean taxon size (defined as the number of next-lower-rank taxa per higher-rank taxon) is a U-shaped function of mandatory rank, and averages about seven across ranks with no significant differences between classifications. The variability of taxon size is a decreasing function of mandatory rank. The generality of these properties across classifications suggests that they are determined by the psychology of the classification process. In contrast, there are significant differences between classifications in the variability of taxon size and in the prevalence of optional ranks, both of which are greater in biological than in nonbiological classifications. These differences may reflect the nature of the materials classified. This research was supported by a research grant from the UCLA Academic Senate and by computer time from the UCLA Office of Academic Computing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号