首页 | 本学科首页   官方微博 | 高级检索  
     

几种常用文本分类算法性能比较与分析
引用本文:卢苇,彭雅. 几种常用文本分类算法性能比较与分析[J]. 湖南大学学报(自然科学版), 2007, 34(6): 67-69
作者姓名:卢苇  彭雅
作者单位:北京交通大学,软件学院,北京,100044;湖南大学,计算机与通信学院,湖南,长沙,410082
基金项目:教育部科学技术研究重点课题资助项目(107114)
摘    要:分析了几种典型的文本分类算法的特点,并基于中文文本数据集和英文文本数据集对算法性能进行了综合评价.实验结果表明:对于英文文本数据,支持向量机具有最优的性能,但时间开销最大,贝叶斯算法速度较快;对于中文文本数据,由于分词的困难,使得算法性能普遍低于同等规模下在英文数据集上的性能.几种算法性能均随训练集规模的增大而有改善.

关 键 词:文本分类  支持向量机  k 近邻  贝叶斯算法  TFIDF 算法
文章编号:1000-2472(2007)06-0067-03
修稿时间:2007-03-20

Performance Comparison and Analysis of Several General Text Classification Algorithms
LU Wei,PENG Ya. Performance Comparison and Analysis of Several General Text Classification Algorithms[J]. Journal of Hunan University(Naturnal Science), 2007, 34(6): 67-69
Authors:LU Wei  PENG Ya
Affiliation:1. College of Software, Beijing Jiaotong Univ, Beijing 100044,China; 2. College of Computer and Communication, Hunan Univ, Changsha, Hunan 410082, China
Abstract:Several typical text classification algorithms were analyzed,and the performance of different algorithms was evaluated synthetically according to Chinese and English text database.Experimental results show some significant conclusions: firstly,support vector machine has the best performance with maximal time expenditure and Bayes algorithm is the fastest on English text database;secondly,the performance of each algorithm on Chinese text database is lower than that on English text database because of the difficulty of Chinese word segmentation;thirdly,the performance of each algorithm improves with increasing the training set.
Keywords:text processing  support vector machine  k nearest neighbor  Bayes algorithm  TFIDF algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《湖南大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《湖南大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号