首页 | 本学科首页   官方微博 | 高级检索  
     

基于条件变分自编码器的问题生成方法
引用本文:刘东,洪宇,苏玉兰,张民. 基于条件变分自编码器的问题生成方法[J]. 山东大学学报(理学版), 2023, 58(1): 48-58. DOI: 10.6040/j.issn.1671-9352.2.2021.035
作者姓名:刘东  洪宇  苏玉兰  张民
作者单位:苏州大学计算机科学与技术学院, 江苏 苏州 215006
基金项目:国家重点研发计划资助项目(2020YFB1313601);国家自然科学基金资助项目(62076174)
摘    要:将条件变分自编码器作为辅助模块,引入预训练语言模型的编码解码过程,通过数据增强(潜在的语义扩充)以提高模型的鲁棒性。通过建立陈述句与疑问句之间的高维分布联系,由分布采样实现一对多的问题生成。结果表明,融合条件变分自编码器不仅能生成多样性的问题,也有助于提升问题生成的模型性能。在基于SQuAD数据集划分的2个答案可知问题生成数据集Split1和Split2上,BLEU-4值分别被提升到20.75%和21.61%。

关 键 词:条件变分自编码器  问题生成  预训练语言模型

Question generation method based on conditional variational autoencoder
LIU Dong,HONG Yu,SU Yu-lan,ZHANG Min. Question generation method based on conditional variational autoencoder[J]. Journal of Shandong University, 2023, 58(1): 48-58. DOI: 10.6040/j.issn.1671-9352.2.2021.035
Authors:LIU Dong  HONG Yu  SU Yu-lan  ZHANG Min
Affiliation:School of Computer Science and Technology, Soochow University, Suzhou 215006, Jiangsu, China
Abstract:The conditional variational autoencoder, as an auxiliary module, is introduced into the encoding and decoding process of the pre-trained language model. It improves the robustness of the model through data augmentation(potential semantic expansion), and establishes a high-dimensional distribution connection between declarative sentences and interrogative sentences, which implements one-to-many question generation by sampling from the distribution. The results show that the fusion of conditional variational autoencoder can not only generate diverse questions, but also help to improve the performance of the question generation model. On the two answer-known question generation datasets Split1 and Split2, which are based on the SQuAD dataset, the BLEU-4 score is improved to 20.75% and 21.61%, respectively.
Keywords:conditional variational autoencoder  question generation  pre-trained language model  
点击此处可从《山东大学学报(理学版)》浏览原始摘要信息
点击此处可从《山东大学学报(理学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号