首页 | 本学科首页   官方微博 | 高级检索  
     检索      

结合词性信息的基于注意力机制的双向LSTM的中文文本分类
引用本文:高成亮,徐 华,高 凯.结合词性信息的基于注意力机制的双向LSTM的中文文本分类[J].河北科技大学学报,2018,39(5):447-454.
作者姓名:高成亮  徐 华  高 凯
作者单位:河北科技大学信息科学与工程学院,清华大学计算机系,河北科技大学信息科学与工程学院
基金项目:国家自然科学基金(61673235,61772075); 河北省自然科学基金(F2017208012); 河北省硕士研究生创新资助项目(CXZZSS2017095)
摘    要:基于LSTM的中文文本分类方法能够正确地识别文本所属类别,但是其主要关注于学习与主题相关的文本片段,往往缺乏利用词语其他方面的信息,特别是词性之间的隐含的特征信息。为了有效地利用词语的词性信息以便学习大量的上下文依赖特征信息并提升文本分类效果,提出了一种结合词性信息的中文文本分类方法,其能够方便地从词语及其词性中学习隐式特征信息。利用开源数据并设计一系列对比实验用于验证方法的有效性。实验结果表明,结合词性信息的基于注意力机制的双向LSTM模型,在中文文本分类方面的分类效果优于常见的一些算法。因此识别文本的类别不仅与词语语义信息高度相关,而且与词语的词性信息有很大关系。

关 键 词:自然语言处理  中文文本分类  注意力机制  LSTM  词性
收稿时间:2018/9/1 0:00:00
修稿时间:2018/10/8 0:00:00

Attention-based BILSTM network with part-of-speech features for Chinese text classification
GAO Chengliang,XU Hua and GAO Kai.Attention-based BILSTM network with part-of-speech features for Chinese text classification[J].Journal of Hebei University of Science and Technology,2018,39(5):447-454.
Authors:GAO Chengliang  XU Hua and GAO Kai
Abstract:The Chinese classification methods based on LSTM can correctly identify the category oftext, but such classification methods mainly focus on learning the text fragments related to the theme without aiming at other aspects of the words in context, especially the implicit feature information contained in the part-of-speech. In order to use the part-of-speech information of words effectively to learn a lot of context-dependent feature information and then improve the performance of text classification, this paper proposes a Chinese classification method combining part-of-speech information, which can easily learn implicit features from words and their part-of-speech.To verify the effectiveness of the attention-based BILSTM model with part-of-speech for Chinese classification tasks, this paper designs a series of comparative experiments and conducts on source-open dataset. The experimental results show that the attention-based BILSTMmodel with part-of-speech has better performance on Chinese classification than some baselines, and it proves that the Chinese classification model proposed in this paper is effective.This indicates that identifying the category of text is not only highly correlated with the semantic information of the words, but also has a great relationship with the information of the words'' part-of-speech.
Keywords:natural language processing  Chinese text classification  attention mechanism  LSTM  part-of-speech
点击此处可从《河北科技大学学报》浏览原始摘要信息
点击此处可从《河北科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号