首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于多类特征池化的文本分类算法
引用本文:阳馨,蒋伟,刘晓玲.基于多类特征池化的文本分类算法[J].四川大学学报(自然科学版),2017,54(2):287-292.
作者姓名:阳馨  蒋伟  刘晓玲
作者单位:四川水利职业技术学院,摩托罗拉系统(中国)成都分公司,四川水利职业技术学院
摘    要:文本分类是文本挖掘的一个内容,在信息检索、邮件过滤、网页分类等领域有着广泛的应用价值。目前文本分类算法在特征表示上的信息仍然不足,对此本文提出了基于多种特征池化的文本分类算法。在该算法中,本文首先对分词后的文本采用skip-gram模型获取词向量,然后对整个文本的词向量进行多种池化,最后将多种池化的特征作为一个整体输入到Softmax回归模型中得到文本的类别信息。通过对复旦大学所提供的文本分类语料库(复旦)测试语料的实验,该结果表明本文所给出的多种特征池化方法能够提高文本分类的准确率,说明了本文算法的有效性。

关 键 词:中文文本分类  池化  分类算法  Skip-gram  Softmax
收稿时间:2016/6/12 0:00:00
修稿时间:2016/6/28 0:00:00

Chinese Text Categorization Based on Multi-Pooling
YANG Xin,JIANG Wei and LIU Xiao-Ling.Chinese Text Categorization Based on Multi-Pooling[J].Journal of Sichuan University (Natural Science Edition),2017,54(2):287-292.
Authors:YANG Xin  JIANG Wei and LIU Xiao-Ling
Institution:Motorola Solutions (China) Chengdu Design Center,Sichuan water conservancy vocational college
Abstract:Text classification is one of the contents of text mining, which has a wide range of applications in the fields of information retrieval, e-mail filtering, web page classification and so on. At present, the text classification algorithm on the feature representation is still insufficient. This paper proposes a text classification algorithm based on a variety of features. In the algorithm. firstly, the word vector was obtained by using the skip-gram model on the segmentation of text. And then various pool methods are applied to get the vector of the entire text. Finally, the various pool features are a whole input, which is the input of the softmax regression model to obtain the categorization. Through the text classification corpus provided by Fudan University (Fudan) experimental test corpus, the results show that the proposed method can improve the accuracy of text classification, which shows the effectiveness of the proposed algorithm.
Keywords:Chinese text categorization  Pooling  Classification algorithm  Skip-gram  Softmax
点击此处可从《四川大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《四川大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号