首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Classification Using Class Cover Catch Digraphs   总被引:2,自引:0,他引:2  
class cover catch digraphs based on proximity between training observations. Performance comparisons are presented on synthetic and real examples versus k-nearest neighbors, Fisher's linear discriminant and support vector machines. We demonstrate that the proposed semiparametric classifier has performance approaching that of the optimal parametric classifier in cases for which the optimal is available for comparison.  相似文献   

2.
In this paper we show how biplot methodology can be combined with various forms of discriminant analyses leading to highly informative visual displays of the respective class separations. It is demonstrated that the concept of distance as applied to discriminant analysis provides a unified approach to a wide variety of discriminant analysis procedures that can be accommodated by just changing to an appropriate distance metric. These changes in the distance metric are crucial for the construction of appropriate biplots. Several new types of biplots viz. quadratic discriminant analysis biplots for use with heteroscedastic stratified data, discriminant subspace biplots and flexible discriminant analysis biplots are derived and their use illustrated. Advantages of the proposed procedures are pointed out. Although biplot methodology is in particular well suited for complementing J > 2 classes discrimination problems its use in 2-class problems is also illustrated.  相似文献   

3.
In many statistical applications data are curves measured as functions of a continuous parameter as time. Despite of their functional nature and due to discrete-time observation, these type of data are usually analyzed with multivariate statistical methods that do not take into account the high correlation between observations of a single curve at nearby time points. Functional data analysis methodologies have been developed to solve these type of problems. In order to predict the class membership (multi-category response variable) associated to an observed curve (functional data), a functional generalized logit model is proposed. Base-line category logit formulations will be considered and their estimation based on basis expansions of the sample curves of the functional predictor and parameters. Functional principal component analysis will be used to get an accurate estimation of the functional parameters and to classify sample curves in the categories of the response variable. The good performance of the proposed methodology will be studied by developing an experimental study with simulated and real data.  相似文献   

4.
The nearest neighbor interchange (nni) metric is a distance measure providing a quantitative measure of dissimilarity between two unrooted binary trees with labeled leaves. The metric has a transparent definition in terms of a simple transformation of binary trees, but its use in nontrivial problems is usually prevented by the absence of a computationally efficient algorithm. Since recent attempts to discover such an algorithm continue to be unsuccessful, we address the complementary problem of designing an approximation to the nni metric. Such an approximation should be well-defined, efficient to compute, comprehensible to users, relevant to applications, and a close fit to the nni metric; the challenge, of course, is to compromise these objectives in such a way that the final design is acceptable to users with practical and theoretical orientations. We describe an approximation algorithm that appears to satisfy adequately these objectives. The algorithm requires O(n) space to compute dissimilarity between binary trees withn labeled leaves; it requires O(n logn) time for rooted trees and O(n 2 logn) time for unrooted trees. To help the user interpret the dissimilarity measures based on this algorithm, we describe empirical distributions of dissimilarities between pairs of randomly selected trees for both rooted and unrooted cases.The Natural Sciences and Engineering Research Council of Canada partially supported this work with Grant A-4142.  相似文献   

5.
Recognizing the successes of treed Gaussian process (TGP) models as an interpretable and thrifty model for nonparametric regression, we seek to extend the model to classification. Both treed models and Gaussian processes (GPs) have, separately, enjoyed great success in application to classification problems. An example of the former is Bayesian CART. In the latter, real-valued GP output may be utilized for classification via latent variables, which provide classification rules by means of a softmax function. We formulate a Bayesian model averaging scheme to combine these two models and describe a Monte Carlo method for sampling from the full posterior distribution with joint proposals for the tree topology and the GP parameters corresponding to latent variables at the leaves. We concentrate on efficient sampling of the latent variables, which is important to obtain good mixing in the expanded parameter space. The tree structure is particularly helpful for this task and also for developing an efficient scheme for handling categorical predictors, which commonly arise in classification problems. Our proposed classification TGP (CTGP) methodology is illustrated on a collection of synthetic and real data sets. We assess performance relative to existing methods and thereby show how CTGP is highly flexible, offers tractable inference, produces rules that are easy to interpret, and performs well out of sample.  相似文献   

6.
In supervised learning, an important issue usually not taken into account by classical methods is that a class represented in the test set may have not been encountered earlier in the learning phase. Classical supervised algorithms will automatically label such observations as belonging to one of the known classes in the training set and will not be able to detect new classes. This work introduces a model-based discriminant analysis method, called adaptive mixture discriminant analysis (AMDA), which can detect several unobserved groups of points and can adapt the learned classifier to the new situation. Two EM-based procedures are proposed for parameter estimation and model selection criteria are used for selecting the actual number of classes. Experiments on artificial and real data demonstrate the ability of the proposed method to deal with complex and real-world problems. The proposed approach is also applied to the detection of unobserved communities in social network analysis.  相似文献   

7.
In multivariate discrimination of several normal populations, the optimal classification procedure is based on quadratic discriminant functions. We compare expected error rates of the quadratic classification procedure if the covariance matrices are estimated under the following four models: (i) arbitrary covariance matrices, (ii) common principal components, (iii) proportional covariance matrices, and (iv) identical covariance matrices. Using Monte Carlo simulation to estimate expected error rates, we study the performance of the four discrimination procedures for five different parameter setups corresponding to standard situations that have been used in the literature. The procedures are examined for sample sizes ranging from 10 to 60, and for two to four groups. Our results quantify the extent to which a parsimonious method reduces error rates, and demonstrate that choosing a simple method of discrimination is often beneficial even if the underlying model assumptions are wrong.The authors wish to thank the editor and three referees for their helpful comments on the first draft of this article. M. J. Schmid supported by grants no. 2.724-0.85 and 2.038-0.86 of the Swiss National Science Foundation.  相似文献   

8.
We propose a new nonparametric family of oscillation heuristics for improving linear classifiers in the two-group discriminant problem. The heuristics are motivated by the intuition that the classification accuracy of a separating hyperplane can be improved through small perturbations to its slope and position, accomplished by substituting training observations near the hyperplane for those used to generate it. In an extensive simulation study, using data generated from multivariate normal distributions under a variety of conditions, the oscillation heuristics consistently improve upon the classical linear and logistic discriminant functions, as well as two published linear programming-based heuristics and a linear Support Vector Machine. Added to any of the methods above, they approach, and frequently attain, the best possible accuracy on the training samples, as determined by a mixed-integer programming (MIP) model, at a much smaller computational cost. They also improve expected accuracy on the overall populations when the populations overlap significantly and the heuristics are trained with large samples, at least in situations where the data conditions do not explicitly favor a particular classifier.  相似文献   

9.
In compositional data analysis, an observation is a vector containing nonnegative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we investigate methods for classification of compositional data. Our approach centers on the idea of using the α-transformation to transform the data and then to classify the transformed data via regularized discriminant analysis and the k-nearest neighbors algorithm. Using the α-transformation generalizes two rival approaches in compositional data analysis, one (when α=1) that treats the data as though they were Euclidean, ignoring the compositional constraint, and another (when α = 0) that employs Aitchison’s centered log-ratio transformation. A numerical study with several real datasets shows that whether using α = 1 or α = 0 gives better classification performance depends on the dataset, and moreover that using an intermediate value of α can sometimes give better performance than using either 1 or 0.  相似文献   

10.
ConsiderN entities to be classified, with given weights, and a matrix of dissimilarities between pairs of them. The split of a cluster is the smallest dissimilarity between an entity in that cluster and an entity outside it. The single-linkage algorithm provides partitions intoM clusters for which the smallest split is maximum. We consider the problems of finding maximum split partitions with exactlyM clusters and with at mostM clusters subject to the additional constraint that the sum of the weights of the entities in each cluster never exceeds a given bound. These two problems are shown to be NP-hard and reducible to a sequence of bin-packing problems. A (N 2) algorithm for the particular caseM =N of the second problem is also presented. Computational experience is reported.Acknowledgments: Work of the first author was supported in part by AFOSR grants 0271 and 0066 to Rutgers University and was done in part during a visit to GERAD, Ecole Polytechnique de Montréal, whose support is gratefully acknowledged. Work of the second and third authors was supported by NSERC grant GP0036426 and by FCAR grant 89EQ4144. We are grateful to Silvano Martello and Paolo Toth for making available to us their program MTP for the bin-paking problem and to three anonymous referees for comments which helped to improve the presentation of the paper.  相似文献   

11.
A maximum likelihood methodology for clusterwise linear regression   总被引:9,自引:0,他引:9  
This paper presents a conditional mixture, maximum likelihood methodology for performing clusterwise linear regression. This new methodology simultaneously estimates separate regression functions and membership inK clusters or groups. A review of related procedures is discussed with an associated critique. The conditional mixture, maximum likelihood methodology is introduced together with the E-M algorithm utilized for parameter estimation. A Monte Carlo analysis is performed via a fractional factorial design to examine the performance of the procedure. Next, a marketing application is presented concerning the evaluations of trade show performance by senior marketing executives. Finally, other potential applications and directions for future research are identified.  相似文献   

12.
In this paper, I show the complementarity of foundationalism and coherentism with respect to any efficient system of beliefs by means of a distinction between two types of proposition drawn from an analogy with an axiomatic system. This distinction is based on the way a given proposition is acknowledged as true, either by declaration (F-proposition) or by preservation (C-proposition). Within such a perspective, i.e., epistemological complementarism, not only can one see how the usual opposition between foundationalism and coherentism is irrelevant, but furthermore one can appreciate the reciprocal relation between these two theories as they refer to two separate epistemological functions involved in the dynamics of constituting and expanding an epistemic system.
Yves BouchardEmail:
  相似文献   

13.
This paper was written with two aims in mind. A large part of it is just an exposition of Tarski's theory of truth. Philosophers do not agree on how Tarski's theory is related to their investigations. Some of them doubt whether that theory has any relevance to philosophical issues and in particular whether it can be applied in dealing with the problems of philosophy (theory) of science.In this paper I argue that Tarski's chief concern was the following question. Suppose a language L belongs to the class of languages for which, in full accordance with some formal conditions set in advance, we are able to define the class of all the semantic interpretations the language may acquire. Every interpretation of L can be viewed as a certain structure to which the expressions of the language may refer. Suppose that a specific interpretation of the language L was singled out as the intended one. Suppose, moreover, that the intended interpretation can be characterized in a metalanguage L +. If the above assumptions are satisfied, can the notion of truth for L be defined in the metalanguage L + and, if it can, how can this be done?  相似文献   

14.
Reduced K-means (RKM) and Factorial K-means (FKM) are two data reduction techniques incorporating principal component analysis and K-means into a unified methodology to obtain a reduced set of components for variables and an optimal partition for objects. RKM finds clusters in a reduced space by maximizing the between-clusters deviance without imposing any condition on the within-clusters deviance, so that clusters are isolated but they might be heterogeneous. On the other hand, FKM identifies clusters in a reduced space by minimizing the within-clusters deviance without imposing any condition on the between-clusters deviance. Thus, clusters are homogeneous, but they might not be isolated. The two techniques give different results because the total deviance in the reduced space for the two methodologies is not constant; hence the minimization of the within-clusters deviance is not equivalent to the maximization of the between-clusters deviance. In this paper a modification of the two techniques is introduced to avoid the afore mentioned weaknesses. It is shown that the two modified methods give the same results, thus merging RKM and FKM into a new methodology. It is called Factor Discriminant K-means (FDKM), because it combines Linear Discriminant Analysis and K-means. The paper examines several theoretical properties of FDKM and its performances with a simulation study. An application on real-world data is presented to show the features of FDKM.  相似文献   

15.
The process of abstraction and concretisation is a label used for an explicative theory of scientific model-construction. In scientific theorising this process enters at various levels. We could identify two principal levels of abstraction that are useful to our understanding of theory-application. The first level is that of selecting a small number of variables and parameters abstracted from the universe of discourse and used to characterise the general laws of a theory. In classical mechanics, for example, we select position and momentum and establish a relation amongst the two variables, which we call Newton’s 2nd law. The specification of the unspecified elements of scientific laws, e.g. the force function in Newton’s 2nd law, is what would establish the link between the assertions of the theory and physical systems. In order to unravel how and with what conceptual resources scientific models are constructed, how they function and how they relate to theory, we need a view of theory-application that can accommodate our constructions of representation models. For this we need to expand our understanding of the process of abstraction to also explicate the process of specifying force functions etc. This is the second principal level at which abstraction enters in our theorising and in which I focus. In this paper, I attempt to elaborate a general analysis of the process of abstraction and concretisation involved in scientific- model construction, and argue why it provides an explication of the construction of models of the nuclear structure.  相似文献   

16.
MCLUST is a software package for model-based clustering, density estimation and discriminant analysis interfaced to the S-PLUS commercial software and the R language. It implements parameterized Gaussian hierarchical clustering algorithms and the EM algorithm for parameterized Gaussian mixture models with the possible addition of a Poisson noise term. Also included are functions that combine hierarchical clustering, EM and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation, and discriminant analysis. MCLUST provides functionality for displaying and visualizing clustering and classification results. A web page with related links can be found at .  相似文献   

17.
18.
19.
该文主要从先秦文献中所载之楼车及云梯形制,说明二者实有相同之功能,即用以窥伺敌军。从汉儒服《左传》时引用《兵法》一书,即提供了一项较少为右来学界注意的材料。后人由于二者名称各不相同,乃以二者并不相涉;文中则典籍所载,提出所谓“飞楼”者,当为设于云梯上用来观察敌情之塔楼,以证二者理应极有关系。  相似文献   

20.
NP-hard Approximation Problems in Overlapping Clustering   总被引:1,自引:1,他引:0  
Lp -norm (p < ∞). These problems also correspond to the approximation by a strongly Robinson dissimilarity or by a dissimilarity fulfilling the four-point inequality (Bandelt 1992; Diatta and Fichet 1994). The results are extended to circular strongly Robinson dissimilarities, indexed k-hierarchies (Jardine and Sibson 1971, pp. 65-71), and to proper dissimilarities satisfying the Bertrand and Janowitz (k + 2)-point inequality (Bertrand and Janowitz 1999). Unidimensional scaling (linear or circular) is reinterpreted as a clustering problem and its hardness is established, but only for the L 1 norm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号