几种常用文本分类算法性能比较与分析 Performance Comparison and Analysis of Several General Text Classification Algorithms期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

几种常用文本分类算法性能比较与分析

引用本文：	卢苇,彭雅. 几种常用文本分类算法性能比较与分析[J]. 湖南大学学报(自然科学版), 2007, 34(6): 67-69

作者姓名：	卢苇彭雅

作者单位：	北京交通大学,软件学院,北京,100044;湖南大学,计算机与通信学院,湖南,长沙,410082

基金项目：	教育部科学技术研究重点课题资助项目(107114)

摘要：	分析了几种典型的文本分类算法的特点,并基于中文文本数据集和英文文本数据集对算法性能进行了综合评价.实验结果表明:对于英文文本数据,支持向量机具有最优的性能,但时间开销最大,贝叶斯算法速度较快;对于中文文本数据,由于分词的困难,使得算法性能普遍低于同等规模下在英文数据集上的性能.几种算法性能均随训练集规模的增大而有改善.
关键词：	文本分类支持向量机 k 近邻贝叶斯算法 TFIDF 算法
文章编号：	1000-2472（2007）06-0067-03
修稿时间：	2007-03-20
Performance Comparison and Analysis of Several General Text Classification Algorithms

LU Wei,PENG Ya. Performance Comparison and Analysis of Several General Text Classification Algorithms[J]. Journal of Hunan University(Naturnal Science), 2007, 34(6): 67-69

Authors:	LU Wei PENG Ya

Affiliation:	1. College of Software, Beijing Jiaotong Univ, Beijing 100044,China; 2. College of Computer and Communication, Hunan Univ, Changsha, Hunan 410082, China

Abstract:	Several typical text classification algorithms were analyzed,and the performance of different algorithms was evaluated synthetically according to Chinese and English text database.Experimental results show some significant conclusions: firstly,support vector machine has the best performance with maximal time expenditure and Bayes algorithm is the fastest on English text database;secondly,the performance of each algorithm on Chinese text database is lower than that on English text database because of the difficulty of Chinese word segmentation;thirdly,the performance of each algorithm improves with increasing the training set.

Keywords:	text processing support vector machine k nearest neighbor Bayes algorithm TFIDF algorithm
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《湖南大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《湖南大学学报(自然科学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏