首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于外部知识筛选的主题文本生成技术研究
引用本文:王沛,杨频,程芃森,代金鞘,贾鹏.基于外部知识筛选的主题文本生成技术研究[J].四川大学学报(自然科学版),2024,61(1):012003.
作者姓名:王沛  杨频  程芃森  代金鞘  贾鹏
作者单位:四川大学网络空间安全学院,四川大学网络空间安全学院,四川大学网络空间安全学院,四川大学网络空间安全学院,四川大学网络空间安全学院
基金项目:四川省科技厅重点研发项目(2021YFG0156)
摘    要:在自然语言生成任务中,主题文本生成是一项富有挑战性的工作,其主要难点在于:源信息量远小于目标生成的信息量.为了解决该问题,本文提出一个基于外部知识筛选的主题文本生成模型Trans K,通过引入与主题词相关的外部知识来丰富源信息,进而提高生成文本的质量.本文为了解决引入外部知识的“一词多义”问题,提出一种基于线性变换的主题向量计算方法,用于筛选和主题词语义一致的外部知识;提出一种基于注意力机制的外部权重计算方法,为每个外部词设定一个主题权重,使其更贴合文本语义;为了解决主题词(含候选词)在生成文本中反复出现的问题,提出一种基于多头注意力机制的内部权重计算方法.在EASSY数据集上的实验表明,与基线相比,Trans K生成文本质量的各项指标更优.此外,人类评估表明,该模型可生成与主题更相关、语言更连贯、且符合语义逻辑的文本.

关 键 词:自然语言生成  主题文本生成  Transformer  HowNet  知识增强
收稿时间:2022/9/20 0:00:00
修稿时间:2022/12/9 0:00:00

Research on topic text generation technology based on external knowledge filtering
Wang Pei,Yang Pin,Cheng Pengsen,Dai Jinqiao and Jia Peng.Research on topic text generation technology based on external knowledge filtering[J].Journal of Sichuan University (Natural Science Edition),2024,61(1):012003.
Authors:Wang Pei  Yang Pin  Cheng Pengsen  Dai Jinqiao and Jia Peng
Institution:School of Cyber Science and Engineering, Sichuan University,School of Cyber Science and Engineering, Sichuan University,School of Cyber Science and Engineering, Sichuan University,School of Cyber Science and Engineering, Sichuan University,School of Cyber Science and Engineering, Sichuan University
Abstract:In the natural language generation task, topic text generation is a challenging task,the main difficulty is that the amount of source information is much smaller than the amount of information generated by the target. To solve this problem, this paper proposes a topic text generation model called Trans-K based on external knowledge filtering, which enriches the source information by introducing external knowledge related to topic words, thereby improving the quality of the generated text. In this paper, in order to solve the "polysemy" problem of introducing external knowledge, a topic vector calculation method based on linear transformation is proposed to filter external knowledge consistent with the semantics of the topic words. An external weight calculation method based on attention mechanism is proposed, which sets a topic weight for each external word to make it more suitable for text semantics. In order to solve the problem that topic words including candidate words, appear repeatedly in the generated text, an internal weight calculation method based on the multi-head attention mechanism is proposed. Experiments on the EASSY dataset show that Trans-K is superior to various indicators of the quality of generated text compared to the baseline. In addition, human evaluations show that the model can generate more topic-relevant, linguistically coherent, and semantically logical''s text.
Keywords:
点击此处可从《四川大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《四川大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号