首页 | 本学科首页   官方微博 | 高级检索  
     

融合词汇特征的生成式摘要模型
引用本文:江跃华,丁磊,李娇娥,杜皓晅,高凯. 融合词汇特征的生成式摘要模型[J]. 河北科技大学学报, 2019, 40(2): 152-158
作者姓名:江跃华  丁磊  李娇娥  杜皓晅  高凯
作者单位:河北科技大学信息科学与工程学院,河北石家庄,050018;石家庄市公安局信息中心,河北石家庄,050021;西安电子科技大学通信工程学院,陕西西安,710126
基金项目:国家自然科学基金(61772075);河北省自然科学基金(F2017208012); 教育部人文社会科学研究专项任务项目(工程科技人才培养研究) (17JDGC022)
摘    要:生成过程中利用词汇特征(包含n-gram和词性信息)识别更多重点词汇内容,进一步提高摘要生成质量,提出了一种基于sequence-to-sequence(Seq2Seq)结构和attention机制的、融合了词汇特征的生成式摘要算法。算法的输入层将词性向量与词向量合并后作为编码器层的输入,编码器层由双向LSTM组成,上下文向量由编码器的输出和卷积神经网络提取的词汇特征向量构成。模型中的卷积神经网络层控制词汇信息,双向LSTM控制句子信息,解码器层使用单向LSTM为上下文向量解码并生成摘要。实验结果显示,在公开数据集和自采数据集上,融合词汇特征的摘要生成模型性能优于对比模型,在公开数据集上的ROUGE-1,ROUGE-2,ROUGE-L分数分别提升了0.024,0.033,0.030。因此,摘要的生成不仅与文章的语义、主题等特征相关,也与词汇特征相关,所提出的模型在融合关键信息的生成式摘要研究中具有一定的参考价值。

关 键 词:自然语言处理  文本摘要  注意力机制  LSTM  CNN
收稿时间:2018-10-14
修稿时间:2019-03-01

Abstractive summarization model considering hybrid lexical features
JIANG Yuehu,DING Lei,LI Jiaoe,DU Haoxuan and GAO Kai. Abstractive summarization model considering hybrid lexical features[J]. Journal of Hebei University of Science and Technology, 2019, 40(2): 152-158
Authors:JIANG Yuehu  DING Lei  LI Jiaoe  DU Haoxuan  GAO Kai
Abstract:In order to use lexical features (including n-gram and part of speech information) to identify more key vocabulary content in the summarization generation process to further improve the quality of the summarization, an algorithm based on sequence-to-sequence (Seq2Seq) structure and attention mechanism and combining lexical features is proposed. The input layer of the algorithm combines the part of speech vector with the word vector, which is the input of the encoder layer. The encoder layer is composed of bi-directional LSTM, and the context vector is composed of the output of the encoder and the lexical feature vector extracted from the convolution neural network. The convolutional neural network layer in the model controls the lexical information, the bi-directional LSTM controls the sentence information, and the decoder layer uses unidirectional LSTM to decode the context vector and generates the summarization. The experiments on public dataset and the self-collected dataset show that the performance of the summarization generation model considering lexical feature is better than that of the contrast model. The ROUGE-1, ROUGE-2 and ROUGE-L scores on the public dataset are improved by 0.024, 0.033 and 0.030, respectively. Therefore, the generation of summarization is not only related to the semantics and themes of the article, but also to the lexical features.The proposed model provides a certain reference value in the research of generating summarization of integrating key infromation.
Keywords:natural language processing   text summarization   attention mechanism   LSTM   CNN
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《河北科技大学学报》浏览原始摘要信息
点击此处可从《河北科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号