首页 | 本学科首页   官方微博 | 高级检索  
     

一种高效的连续属性离散化算法
引用本文:赵静娴,倪春鹏,詹原瑞,杜子平. 一种高效的连续属性离散化算法[J]. 系统工程与电子技术, 2009, 31(1): 195-199
作者姓名:赵静娴  倪春鹏  詹原瑞  杜子平
作者单位:1. 天津大学管理学院, 天津, 300072;2. 天津科技大学经管学院, 天津, 300222
摘    要:分析了基于熵的离散化标准的切点特性,提出并证明了一种基于边界点属性值合并和不一致度检验的离散化算法。与传统离散化算法相比,此算法只对边界点属性值进行合并,切点个数无需设定,自动生成,且合并规则简单易行,大大减小了计算量,适用于处理大规模高维数据库的离散化。同时由于采用了不一致度对备选切点集合进行调整,使本算法具有全局性。试验表明,该算法有效提高了分类规则的简明性和预测精度。

关 键 词:离散化  决策树  数据挖掘
收稿时间:2007-10-14
修稿时间:2008-05-21

Efficient discretization algorithm for continuous attributes
ZHAO Jing-xian,NI Chun-peng,ZHAN Yuan-rui,DU Zi-ping. Efficient discretization algorithm for continuous attributes[J]. System Engineering and Electronics, 2009, 31(1): 195-199
Authors:ZHAO Jing-xian  NI Chun-peng  ZHAN Yuan-rui  DU Zi-ping
Affiliation:1. School of Management, Tianjin Univ., Tianjin 300072, China;2. School of Economics and Management, Tianjin Univ. of Science & Technology, Tianjin 300222, China
Abstract:On analysis of the cut points characteristic of entropy-based discretization,an attribute discretization algorithm based on boundary points’ attribute values mergence and inconsistency check is presented.Compared with the traditional discretization algorithms,the proposed method only merges the boundary points’ attribute values,auto-generates cut points’ number without setting them in advance,applies simple rules to merge the intervals,and reduces the computational cost greatly.It is suitable for large scale and high dimension database discretization problems.By applying inconsistency to check the chosen cut points set,the algorithm possesses global property.Experiments show that the method can improve the simplicity and the prediction precision of classifying rules.
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《系统工程与电子技术》浏览原始摘要信息
点击此处可从《系统工程与电子技术》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号