首页 | 本学科首页   官方微博 | 高级检索  
     

一种新的样本选择算法及其在文本分类中的应用
引用本文:万中英,王明文,左家莉,刘长红. 一种新的样本选择算法及其在文本分类中的应用[J]. 江西师范大学学报(自然科学版), 2019, 0(1): 76-83. DOI: 10.16357/j.cnki.issn1000-5862.2019.01.13
作者姓名:万中英  王明文  左家莉  刘长红
作者单位:江西师范大学计算机信息工程学院,江西 南昌 330022
摘    要:在保证分类性能的前提下,如何从大量的训练样本集合中选择重要样本子集,是模式分类中的一个重要问题.基于该问题提出了一种新的样本选择算法,并将该算法应用于文本分类,在标准文档集Reuters-21578、复旦文档集和20newsGroup新闻组文档集上进行了实验.实验结果表明:该方法能有效地选取边界样本,且采用SVM和KNN分类能得到较好的分类结果,尤其是在不均衡文档集上效果更佳.

关 键 词:边界样本  样本选择  文本分类  支持向量机  K近邻

The New Boundary Sample Selection Method and Its Application in the Text Classification
WAN Zhongying,WANG Mingwen,ZUO Jiali,LIU Changhong. The New Boundary Sample Selection Method and Its Application in the Text Classification[J]. Journal of Jiangxi Normal University (Natural Sciences Edition), 2019, 0(1): 76-83. DOI: 10.16357/j.cnki.issn1000-5862.2019.01.13
Authors:WAN Zhongying  WANG Mingwen  ZUO Jiali  LIU Changhong
Affiliation:School of Computer Information Engineering,Jiangxi Normal University,Nanchang Jiangxi 330022,China
Abstract:On the premise of ensuring the classification performance,how to select an important sample set from a large number of training sample sets has become an important issue in the pattern classification.Aiming at this problem,a new sample selection algorithm is proposed and applied to text categorization.Experiments are carried out on the standard document set Reuters-21578,Fudan document set and 20 news group document set.The experimental results show that the proposed method can effectively select the boundary samples,and the SVM and KNN classifiers can get better classification results,especially on the unbalanced document set.
Keywords:boundary samples  sample selection  text classification  SVM  KNN
本文献已被 CNKI 等数据库收录!
点击此处可从《江西师范大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《江西师范大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号