首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于改进[FK(W][BG(][BHDWG1,WK*2/3W][ZB(][BHDWG*3,WKW] χ [ZB)W][BG)W][FK)] 2统计的数据离散化算法
引用本文:桑雨,李克秋,闫德勤.基于改进[FK(W][BG(][BHDWG1,WK*2/3W][ZB(][BHDWG*3,WKW] χ [ZB)W][BG)W][FK)] 2统计的数据离散化算法[J].大连理工大学学报,2012,52(3):443-447.
作者姓名:桑雨  李克秋  闫德勤
作者单位:1. 大连理工大学计算机科学与技术学院,辽宁大连,116024
2. 辽宁师范大学计算机与信息技术学院,辽宁大连,116029
基金项目:教育部新世纪优秀人才支持计划资助项目
摘    要:在基于χ2统计独立性的离散化算法中,自由度与期望频数的选取直接影响χ2计算的准确性,从而影响离散化的性能.为此,提出了一种基于改进χ2统计的数据离散化算法,提高了基于统计独立性离散化算法的质量.首先,分析了χ2函数中自由度选取的不足,给出了自由度选取的修正方案;其次,根据数据类分布等特点,提出了期望频数的改进方案,克服了不同数据集赋予相同期望频数的缺陷,提高了χ2计算的准确性.实验结果表明,改进的方法显著提高了C4.5决策树与Naive贝叶斯分类器的学习精度.

关 键 词:离散化  数据挖掘  χ2统计

A data discretization algorithm based on improved chi-square statistic
SANG Yu,LI Keqiu,YAN Deqin.A data discretization algorithm based on improved chi-square statistic[J].Journal of Dalian University of Technology,2012,52(3):443-447.
Authors:SANG Yu  LI Keqiu  YAN Deqin
Institution:1.School of Computer Science and Technology,Dalian University of Technology,Dalian 116024,China; 2.School of Computer and Information Technology,Liaoning Normal University,Dalian 116029,China)
Abstract:The selection of degree of freedom and expected frequency directly affects the accuracy of chi-square calculation in discretization algorithms based on chi-square statistical independence.This will affect the performance of discretization.A data discretization method based on improved chi-square statistic is proposed.It improves the quality of discretization algorithm based on statistical independence.Firstly,the deficiency of the selection of degree of freedom in chi-square function is analyzed,and a modified scheme for selection of degree of freedom is given.Secondly,an improved scheme for expected frequency is proposed according to data class distribution,which overcomes the deficiency that different datasets have the same expected frequency.This improves the accuracy of chi-square calculation.The experimental results show that the improved algorithm improves the learning accuracy of C4.5 decision tree and Naive Bayes classifier.
Keywords:discretization  data mining  chi-square statistic
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《大连理工大学学报》浏览原始摘要信息
点击此处可从《大连理工大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号