首页 | 本学科首页   官方微博 | 高级检索  
     

基于遗传算法和信息熵的文本分类规则抽取方法研究
引用本文:唐华,曾碧卿. 基于遗传算法和信息熵的文本分类规则抽取方法研究[J]. 中山大学学报(自然科学版), 2007, 46(5): 18-21,24
作者姓名:唐华  曾碧卿
作者单位:华南师范大学南海校区计算机工程系 广东佛山528225
摘    要:针对数据挖掘中的文本分类问题,提出了一种基于遗传算法和信息熵的文本分类规则抽取算法Genet-ic-Miner(简称GM),该算法的目标是在数据集中发现分类规则。首先利用信息熵生成初始种群,然后利用优化的遗传算法抽取相应规则。采用六个标准的公共领域的数据集比较了GM与其它两个非常著名的同类算法Ant-Miner和CN2,实验结果表明,无论是预测准确性和规则的简单性,GM都明显优于Ant-Miner和CN2,并且该算法能大大提高对知识的理解力。

关 键 词:文本分类规则  知识发现  信息熵  遗传算法  数据挖掘
文章编号:0529-6579(2007)05-0018-05
修稿时间:2007-01-09

Research on Method of Text Classification Rule Extraction Based on Genetic Algorithm and Entropy
TANG Hua,ZENG Bi-qing. Research on Method of Text Classification Rule Extraction Based on Genetic Algorithm and Entropy[J]. Acta Scientiarum Naturalium Universitatis Sunyatseni, 2007, 46(5): 18-21,24
Authors:TANG Hua  ZENG Bi-qing
Affiliation:Computer Engineering Department of Nanhai Campus ,South China Normal University, Foshan 528225, China
Abstract:Aimed at the text classification problems in data mining,a text classification rule extraction method is proposed based on genetic algorithm and entropy for rule discovery called Genetic-Miner(GM).The goal of GM is to discover classification rules in data sets.It produces population with the entropy and then extract classification rule with genetic algorithm.Compared the performance of GM with other two well-known algorithms Ant-miner and CN2 in six public domain data sets,the results showed that GM has a better performance in both predictive accuracy and rule list simplicity criteria than Ant-Miner and CN2.It can also mostly improve the comprehensibility of the discovered knowledge.
Keywords:text classification rule  data mining  discover knowledge  information entropy  genetic algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号