首页 | 本学科首页   官方微博 | 高级检索  
     检索      

融合GINI指数的C4.5算法的分类研究
引用本文:聂 斌,李 欢,罗计根,杜建强,周 丽,黄 强.融合GINI指数的C4.5算法的分类研究[J].江西师范大学学报(自然科学版),2019,0(5):469-472.
作者姓名:聂 斌  李 欢  罗计根  杜建强  周 丽  黄 强
作者单位:江西中医药大学计算机学院, 江西 南昌 330004
摘    要:信息增益率倾向于取值数较少的属性和产生不平衡的划分,GINI指数偏向于取值数较多的属性且区间趋于平衡的划分.基于此,该文提出融合GINI指数的C4.5改进算法,首先计算候选属性的信息增益率和GINI指数,其次计算信息增益率和GINI指数的比值,最后筛选出比值最大的属性作为划分结点,改进了C4.5算法的不足.以10次10折交叉验证准确率和运行时间为评价指标,通过5组UCI数据测试改进算法性能,并与ID3、C4.5和CART算法对比实验.实验结果表明:融合GINI指数的C4.5算法减轻了属性取值多少对划分结点选择的影响,并且缓和了划分区间的不平衡,提高了分类准确率和运行效率,算法更加稳定,可行有效.

关 键 词:C4.5算法  GINI指数  决策树  中医药信息

The Study on Classification of C4.5 Algorithms with GINI Index
NIE Bin,LI Huan,LUO Jigen,DU Jianqiang,ZHOU Li,HUANG Qiang.The Study on Classification of C4.5 Algorithms with GINI Index[J].Journal of Jiangxi Normal University (Natural Sciences Edition),2019,0(5):469-472.
Authors:NIE Bin  LI Huan  LUO Jigen  DU Jianqiang  ZHOU Li  HUANG Qiang
Institution:School of Computer Science,Jiangxi University of Traditional Chinese Medicine,Nanchang Jiangxi 330004,China
Abstract:The information gain rate tends to take fewer attributes and produce an imbalance partition.The GINI index tends to take more attributes and produce the balanced partition.Based on this,an improve C4.5 algorithm combining GINI index is proposed.The algorithm first calculates the information gain rate and GINI index of candidate attributes,and then calculates the ratio of information gain rate to GINI index.Finally,the attribute with the largest ratio is selected as the segmentation node,which improves the shortcomings of the C4.5 algorithm.Taking ten times and ten fold cross-validation accuracy and running time as evaluation index,the improved algorithm performance is tested through five UCI data sets and compared with ID3,C4.5 and CART algorithms.The results show that the C4.5 algorithm combining GINI index reduces the influence of attribute value on the selection of partition nodes,and alleviates the imbalance of partition interval,which improves the classification accuracy and operation efficiency.The algorithm is more stable and feasible.
Keywords:C4  5 algorithm  GINI index  decision tree  information of Chinese medicine
点击此处可从《江西师范大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《江西师范大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号