首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种可用于分类型属性数据的多变量决策树算法
引用本文:刘振宇,宋晓莹.一种可用于分类型属性数据的多变量决策树算法[J].东北大学学报(自然科学版),2020,41(11):1521-1527.
作者姓名:刘振宇  宋晓莹
作者单位:(1. 东北大学 软件中心, 辽宁 沈阳110819; 2. 大连东软信息学院 网络安全与计算技术重点实验室, 辽宁 大连116023)
基金项目:国家自然科学基金资助项目(61772101,61602075); 辽宁省重点研发计划项目(2018).
摘    要:针对绝大部分多变量决策树只能联合数值型属性,而不能直接为带有分类型属性数据集进行分类的问题,提出一种可联合多种类型属性的多变量决策树算法(CMDT).该算法通过统计各个分类型属性的属性值在各个类别或各个簇中的频率分布,来定义样本集合在分类型属性上的中心,以及样本到中心的距离.然后,使用加权k-means算法划分决策树中的非终端结点.使用这种结点划分方法构建的决策树可用于数值型数据、分类型数据以及混合型数据.实验结果表明,该算法建立的分类模型在各种类型的数据集上均获得比经典决策树算法更好的泛化正确率和更简洁的树结构.

关 键 词:决策树  分类型属性  多变量决策树  结点划分  k-均值  
收稿时间:2019-10-24
修稿时间:2019-10-24

An Applicable Multivariate Decision Tree Algorithm for Categorical Attribute Data
LIU Zhen-yu,SONG Xiao-ying.An Applicable Multivariate Decision Tree Algorithm for Categorical Attribute Data[J].Journal of Northeastern University(Natural Science),2020,41(11):1521-1527.
Authors:LIU Zhen-yu  SONG Xiao-ying
Institution:1. Software Center, Northeastern University, Shenyang 110819, China; 2. Key Laboratory of Network Security and Computing Technology, Dalian Neusoft University of Information, Dalian 116023, China.
Abstract:Most multivariate decision trees are applicable for only the numerical data. To solve the classification problem on categorical attribute data, an applicable multivariate decision tree(CMDT) algorithm is proposed. The center of the sample set on the categorical attributes, and the distance between the samples and the centers are defined with statistics for frequency distribution of categorical attribute values in each category or each cluster. Weighted k-means algorithm is utilized to split the nodes in the decision tree. The proposed multivariate decision tree is applicable for numerical data, categorical data, and mixed data. Experiment results show that the classification model based on the proposed algorithm can get more concise tree construction and higher generalization accuracy than that based on the classic decision tree algorithms with different kinds of data.
Keywords:decision tree  categorical attribute  multivariate decision tree  node split  k-means  
点击此处可从《东北大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《东北大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号