首页 | 本学科首页   官方微博 | 高级检索  
     

停用词的选取对文本分类效果的影响研究
引用本文:崔彩霞. 停用词的选取对文本分类效果的影响研究[J]. 太原师范学院学报(自然科学版), 2008, 7(4): 91-93
作者姓名:崔彩霞
作者单位:太原师范学院计算机系,山西太原,030012
摘    要:考察了2种常用的基于统计方法的停用词选取方法,结合语言学知识,提出了一种统计和语言学结合的停用词选取的方法,以支持向量机作为分类器在复旦语料上做了实验,结果表明该方法在保证文本分类的准确率的基础上,可以大大地降低特征词的维数.

关 键 词:文本分类  停用词  特征选择

Research on the Effect of Stop Words Selection on Text Categorization
Cui Caixia. Research on the Effect of Stop Words Selection on Text Categorization[J]. Journal of Taiyuan Normal University:Natural Science Edition, 2008, 7(4): 91-93
Authors:Cui Caixia
Affiliation:Cui Caixia (Department of Computer, Taiyuan Normal University, Taiyuan 030012,China)
Abstract:Investigated two common methods of selecting stop words based on statistical methods,a new method of selecting stop words combining statistics and linguistics is proposed.Experiments on Fudan corpus have been made by using SVM.The results show that the method can ensure the accuracy of text categorization and can greatly reduce the dimensions of the characteristic words.
Keywords:text categorization  stop words  feature selection
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号