首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 203 毫秒
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyze the ratios of the data values. A common approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property and can be applied to a wider class of methods. This weighted log-ratio analysis is theoretically equivalent to “spectral mapping”, a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modeling. The weighted log-ratio methodology is used here to visualize frequency data in linguistics and chemical compositional data in archeology. The first author acknowledges research support from the Fundación BBVA in Madrid as well as partial support by the Spanish Ministry of Education and Science, grant MEC-SEJ2006-14098. The constructive comments of the referees, who also brought additional relevant literature to our attention, significantly improved our article.  相似文献   

In this paper, we present empirical and theoretical results on classification trees for randomized response data. We considered a dichotomous sensitive response variable with the true status intentionally misclassified by the respondents using rules prescribed by a randomized response method. We assumed that classification trees are grown using the Pearson chi-square test as a splitting criterion, and that the randomized response data are analyzed using classification trees as if they were not perturbed. We proved that classification trees analyzing observed randomized response data and estimated true data have a one-to-one correspondence in terms of ranking the splitting variables. This is illustrated using two real data sets.  相似文献   

Recognizing the successes of treed Gaussian process (TGP) models as an interpretable and thrifty model for nonparametric regression, we seek to extend the model to classification. Both treed models and Gaussian processes (GPs) have, separately, enjoyed great success in application to classification problems. An example of the former is Bayesian CART. In the latter, real-valued GP output may be utilized for classification via latent variables, which provide classification rules by means of a softmax function. We formulate a Bayesian model averaging scheme to combine these two models and describe a Monte Carlo method for sampling from the full posterior distribution with joint proposals for the tree topology and the GP parameters corresponding to latent variables at the leaves. We concentrate on efficient sampling of the latent variables, which is important to obtain good mixing in the expanded parameter space. The tree structure is particularly helpful for this task and also for developing an efficient scheme for handling categorical predictors, which commonly arise in classification problems. Our proposed classification TGP (CTGP) methodology is illustrated on a collection of synthetic and real data sets. We assess performance relative to existing methods and thereby show how CTGP is highly flexible, offers tractable inference, produces rules that are easy to interpret, and performs well out of sample.  相似文献   

In this paper we propose the concept of structural similarity as a relaxation of blockmodeling in social network analysis. Most previous approaches attempt to relax the constraints on partitions, for instance, that of being a structural or regular equivalence to being approximately structural or regular, respectively. In contrast, our approach is to relax the partitions themselves: structural similarities yield similarity values instead of equivalence or non-equivalence of actors, while strictly obeying the requirement made for exact regular equivalences. Structural similarities are based on a vector space interpretation and yield efficient spectral methods that, in a more restrictive manner, have been successfully applied to difficult combinatorial problems such as graph coloring. While traditional blockmodeling approaches have to rely on local search heuristics, our framework yields algorithms that are provably optimal for specific data-generation models. Furthermore, the stability of structural similarities can be well characterized making them suitable for the analysis of noisy or dynamically changing network data.  相似文献   

Ordered set theory provides efficient tools for the problems of comparison and consensus of classifications Here, an overview of results obtained by the ordinal approach is presented Latticial or semilatticial structures of the main sets of classification models are described Many results on partitions are adaptable to dendrograms; many results on n-trees hold in any median semilattice and thus have counterparts on ordered trees and Buneman (phylogenetic) trees For the comparison of classifications, the semimodularity of the ordinal structures involved yields computable least-move metrics based on weighted or unweighted elementary transformations In the unweighted case, these metrics have simple characteristic properties For the consensus of classifications, the constructive, axiomatic, and optimization approaches are considered Natural consensus rules (majoritary, oligarchic, ) have adequate ordinal formalizations A unified presentation of Arrow-like characterization results is given In the cases of n-trees, ordered trees and Buneman trees, the majority rule is a significant example where the three approaches convergeThe authors would like to thank the anonymous referees for helpful suggestions on the first draft of this paper, and W H E Day for his comments and his significant improvements of style  相似文献   

We devise a classification algorithm based on generalized linear mixed model (GLMM) technology. The algorithm incorporates spline smoothing, additive model-type structures and model selection. For reasons of speed we employ the Laplace approximation, rather than Monte Carlo methods. Tests on real and simulated data show the algorithm to have good classification performance. Moreover, the resulting classifiers are generally interpretable and parsimonious.  相似文献   

Two algorithms for pyramidal classification — a generalization of hierarchical classification — are presented that can work with incomplete dissimilarity data. These approaches — a modification of the pyramidal ascending classification algorithm and a least squares based penalty method — are described and compared using two different types of complete dissimilarity data in which randomly chosen dissimilarities are assumed missing and the non-missing ones are subjected to random error. We also consider relationships between hierarchical classification and pyramidal classification solutions when both are based on incomplete dissimilarity data.  相似文献   

Improvements to the dynamic programming (DP) strategy for partitioning (nonhierarchical classification) as discussed in Hubert, Arabie, and Meulman (2001) are proposed. First, it is shown how the number of evaluations in the DP process can be decreased without affecting generality. Both a completely nonredundant and a quasi-nonredundant method are proposed. Second, an efficient implementation of both approaches is discussed. This implementation is shown to have a dramatic increase in speed over the original program. The flexibility of the approach is illustrated by analyzing three data sets.  相似文献   

通过对原始文献的解读,梳理了引力规范理论80年发展之历史,澄清了关于平移势定义的混乱,指出了引力规范理论与Yang-Mills规范理论之间的联系与区别.在Yang-Mills理论中,规范势是主丛上的Ehresmann联络;而在引力规范理论中,引力势是嘉当几何中的嘉当联络.  相似文献   

"三焦"的名称和概念为中医所独有,而西医则阙如;既有译法为直译、音译和意译,均有一定的局限性。文章根据翻译的阐释学理论,从微观的角度分析了该词的深层含义,建议以词素翻译结合音译,使该术语的翻译保留中医文化的特色同时具有一定的可读性。  相似文献   

众所周知,为了避免理解过程中出现模棱两可或产生歧义,科技英语文体在表达上强调确切性、清晰性以及客观性。同样,科技术语翻译过程中也要尽可能地体现出对应译名表达上的科学性、逻辑性、准确性与严密性。也就是说,科技翻译中也同样要求做到表述确切、明白,尽量避免歧义。而事实上,从概念表述的角度来看,科技术语的对应译名在翻译转换过程中并不一定能够实现完全意义上的等值。因此,在科技术语翻译过程中有时也需要借助于相关的“模糊”处理原则来再现原文所负载的语义内涵。  相似文献   

A study of standardization of variables in cluster analysis   总被引:2,自引:2,他引:0  
A methodological problem in applied clustering involves the decision of whether or not to standardize the input variables prior to the computation of a Euclidean distance dissimilarity measure. Existing results have been mixed with some studies recommending standardization and others suggesting that it may not be desirable. The existence of numerous approaches to standardization complicates the decision process. The present simulation study examined the standardization problem. A variety of data structures were generated which varied the intercluster spacing and the scales for the variables. The data sets were examined in four different types of error environments. These involved error free data, error perturbed distances, inclusion of outliers, and the addition of random noise dimensions. Recovery of true cluster structure as found by four clustering methods was measured at the correct partition level and at reduced levels of coverage. Results for eight standardization strategies are presented. It was found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure. The result held over different error conditions, separation distances, clustering methods, and coverage levels. The traditionalz-score transformation was found to be less effective in several situations.  相似文献   

Unique parametrizations of models are very important for parameter interpretation and consistency of estimators. In this paper we analyze the identifiability of a general class of finite mixtures of multinomial logits with varying and fixed effects, which includes the popular multinomial logit and conditional logit models. The application of the general identifiability conditions is demonstrated on several important special cases and relations to previously established results are discussed. The main results are illustrated with a simulation study using artificial data and a marketing dataset of brand choices.  相似文献   

本项研究根据国际航天产业的现实设定假设条件,以一次性运载器的价格和生产为变量,环境因素为参数.提出了两种国际商业航天发射市场模型,一种是单个厂商面对的市场模型,另一种是双寡头垄断的市场模型,由于两种模型从不同的角度对商业发射市场做出解释,所以联合使用两种模型能够获得对商业发射市场更全面的理解,模型对市场的未来走向具有较好的预见。  相似文献   

随着中国改革开放的不断深入与全球化进程的加快,中国各行各业都面临着与一国或数国打交道的问题。在这一过程中,翻译,尤其是多语种间的翻译就显得非常重要和必要。但是相关翻译研究却严重滞后。文章拟从以下几个方面就多语间术语翻译的策略问题进行探讨:(1)多语种术语不等值是客观存在;(2)针对多语种术语不等值问题拟采取的翻译策略;(3)多语种术语翻译展望。  相似文献   

The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A distance-based discriminant algorithm and a robust multidimensional centroid estimate illustrate the theory, closely connected to the Gaussian kernels of Machine Learning.  相似文献   

项目(课题)承担单位主动披露其在执行政府资助科技项目过程中完成的发明成果信息,对于项目管理部门及时、有效地获取各种必要的科技成果资料和数据,提升政府创新管理水平有重要意义。从立法的角度考虑,建立完善的国家科技计划成果信息披露制度,需要从两个方面着手:一是实体法方面,既要规定项目承担单位披露政府资助发明成果信息的时间、内容、形式,也要规定违反披露义务所应该承担的责任;二是需要制定与实体规定相配套的程序规定,以保障实体规定的实现。  相似文献   

Gaussian distribution has for several decades been ubiquitous in the theory and practice of statistical classification. Despite the early proposals motivating the use of predictive inference to design a classifier, this approach has gained relatively little attention apart from certain specific applications, such as speech recognition where its optimality has been widely acknowledged. Here we examine statistical properties of different inductive classification rules under a generic Gaussian model and demonstrate the optimality of considering simultaneous classification of multiple samples under an attractive loss function. It is shown that the simpler independent classification of samples leads asymptotically to the same optimal rule as the simultaneous classifier when the amount of training data increases, if the dimensionality of the feature space is bounded in an appropriate manner. Numerical investigations suggest that the simultaneous predictive classifier can lead to higher classification accuracy than the independent rule in the low-dimensional case, whereas the simultaneous approach suffers more from noise when the dimensionality increases.  相似文献   

文章提出对等的相对原则,侧重描写性方法,应用概念合成和其他相关认知理论,描写术语译名出现的对等不一致、对等不准确和对等错误的认知过程,指出译者的不同认知方式,已有知识的调停性质和经验建构的直觉模式等因素是产生这三种翻译问题的主要原因。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号