基于遗传算法和信息熵的文本分类规则抽取方法研究 Research on Method of Text Classification Rule Extraction Based on Genetic Algorithm and Entropy期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于遗传算法和信息熵的文本分类规则抽取方法研究

引用本文：	唐华,曾碧卿. 基于遗传算法和信息熵的文本分类规则抽取方法研究[J]. 中山大学学报(自然科学版), 2007, 46(5): 18-21,24

作者姓名：	唐华曾碧卿

作者单位：	华南师范大学南海校区计算机工程系广东佛山528225

摘要：	针对数据挖掘中的文本分类问题,提出了一种基于遗传算法和信息熵的文本分类规则抽取算法Genet-ic-Miner(简称GM),该算法的目标是在数据集中发现分类规则。首先利用信息熵生成初始种群,然后利用优化的遗传算法抽取相应规则。采用六个标准的公共领域的数据集比较了GM与其它两个非常著名的同类算法Ant-Miner和CN2,实验结果表明,无论是预测准确性和规则的简单性,GM都明显优于Ant-Miner和CN2,并且该算法能大大提高对知识的理解力。
关键词：	文本分类规则知识发现信息熵遗传算法数据挖掘
文章编号：	0529-6579（2007）05-0018-05
修稿时间：	2007-01-09
Research on Method of Text Classification Rule Extraction Based on Genetic Algorithm and Entropy

TANG Hua,ZENG Bi-qing. Research on Method of Text Classification Rule Extraction Based on Genetic Algorithm and Entropy[J]. Acta Scientiarum Naturalium Universitatis Sunyatseni, 2007, 46(5): 18-21,24

Authors:	TANG Hua ZENG Bi-qing

Affiliation:	Computer Engineering Department of Nanhai Campus ,South China Normal University, Foshan 528225, China

Abstract:	Aimed at the text classification problems in data mining,a text classification rule extraction method is proposed based on genetic algorithm and entropy for rule discovery called Genetic-Miner(GM).The goal of GM is to discover classification rules in data sets.It produces population with the entropy and then extract classification rule with genetic algorithm.Compared the performance of GM with other two well-known algorithms Ant-miner and CN2 in six public domain data sets,the results showed that GM has a better performance in both predictive accuracy and rule list simplicity criteria than Ant-Miner and CN2.It can also mostly improve the comprehensibility of the discovered knowledge.

Keywords:	text classification rule data mining discover knowledge information entropy genetic algorithm
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏