首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Rough集的决策树算法
引用本文:乔梅,韩文秀.基于Rough集的决策树算法[J].天津大学学报(自然科学与工程技术版),2005,38(9):842-846.
作者姓名:乔梅  韩文秀
作者单位:[1]天津大学管理学院,天津300072 [2]天津理工大学计算机科学与工程系,天津300191
基金项目:天津市教委高校科技发展基金资助项目(020714);天津理工大学科技发展基金研究资助项目(LG030291)
摘    要:针对基于Rough集的经典分类算法值约简算法等不适合大数据集的问题,提出了基于Rough集的决策树算法。采用一个新的选择属性的测度——属性分类粗糙度作为选择属性的启发式,该测度较Rough中刻画属性相关性的测度正区域等更为全面地刻画了属性分类综合贡献能力,并且比信息增益和信息增益率的计算更为简单。采取了一种新的剪枝方法——预剪枝,即在选择属性计算前基于变精度正区域修正属性对数据的初始划分模式,以更有效地消除噪音数据对选择属性和生成叶节点的影响.采取了一种与决策树算法高度融合的简单有效的检测和处理不相容数据的方法,从而使算法对相容和不相容数据都能进行有效处理。对UCI机器学习数据库中几个数据集的挖掘结果表明,该算法生成的决策树较ID3算法小,与用信息增益率作为启发式的决策树算法生成的决策树规模相当。算法生成所有叶节点均满足给定最小置信度和支持度的决策树或分类规则,并易于利用数据库技术实现,适合大数据集。

关 键 词:Rough集  决策树  属性分类粗糙度  预剪枝  不相容数据
文章编号:0493-2137(2005)09-0842-05
收稿时间:2004-03-26
修稿时间:2004-10-10

Decision Tree Algorithm Based on Rough Set
Qiao Mei;Han WenXiu.Decision Tree Algorithm Based on Rough Set[J].Journal of Tianjin University(Science and Technology),2005,38(9):842-846.
Authors:Qiao Mei;Han WenXiu
Abstract:For the problem that classical classification algorithms such as value reduction algorithm based on Rough set are not suitable for large data sets, this paper proposes a decision tree algorithm based on Rough set. The algorithm takes a novel measure--attribute classification rough degree as the heuristic of choosing attribute at a tree node, which more synthetically measures contribution of an attribute for classification than other measures in Rough set and is simpler in calculation than information gain and information gain ratio. The algorithm adopts a new pruning method,predictive pruning, which makes use of variable precision positive a~as to revise the partition pattern of attribute to the data set at a tree node before the calculation of choosing attribute, thus more effectively eliminating the effect of noise data on choosing attributes and generating leaf nodes. The algorithm takes a simple and efficient method to deal with inconsistent data, which is highly merged with decision tree algorithm, hence it can deal with both consistent and inconsistent data efficiently. The mining results of 6 data sets of UCI machine learning repository- show that the size of trees generated by the algorithm is smaller than that by ID3, and is at the same scale as that generated by the decision tree algorithm using information gain ratio as heuristic. The algorithm can directly generate decision trees or classification rule sets and is easy to realize by database technology, which makes it suitable for large data sets.
Keywords:Rough set  decision tree  attribute classification rough degree  predictive pruning  inconsistent data
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号