首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于事实和语义一致性的生成文本检测
引用本文:董腾飞,杨 频,徐 宇,代金鞘,贾 鹏.基于事实和语义一致性的生成文本检测[J].四川大学学报(自然科学版),2023,60(4):042002.
作者姓名:董腾飞  杨 频  徐 宇  代金鞘  贾 鹏
作者单位:四川大学网络空间安全学院,四川大学网络空间安全学院,四川大学网络空间安全学院,四川大学网络空间安全学院,四川大学网络空间安全学院
基金项目:四川省科技厅重点研发项目(2021YFG0156)
摘    要:文本生成技术的恶意滥用问题日益严重,因此生成文本检测技术至关重要. 现有的检测方法依赖于基于特定数据集的统计异常特征,从而导致方法的泛化能力较差. 本文考虑不同种类生成文本均易出现的事实错误、语义冲突问题,提出了一种基于事实和语义一致性的生成文本检测方法. 该方法通过实体将文本和外部知识库进行比较,得到文本的事实一致性特征. 另一方面,该方法借助文本蕴含技术对文本上文与下文进行关系推理,得到文本的语义一致性特征. 最后将这两类特征与RoBERTa的输出隐藏向量拼接,输入到线性分类层进行预测. 实验结果表明,该方法比当前的检测方法具有更高的准确率和泛化能力.

关 键 词:文本生成  生成文本检测  外部知识库  文本蕴含
收稿时间:2022/7/1 0:00:00
修稿时间:2022/9/12 0:00:00

Generated text detection based on factual and semantic consistency
DONG Teng-Fei,YANG Pin,XU Yu,DAI Jin-Qiao and JIA Peng.Generated text detection based on factual and semantic consistency[J].Journal of Sichuan University (Natural Science Edition),2023,60(4):042002.
Authors:DONG Teng-Fei  YANG Pin  XU Yu  DAI Jin-Qiao and JIA Peng
Institution:College of Cyber Science and Engineering,Sichuan University,College of Cyber Science and Engineering,Sichuan University,College of Cyber Science and Engineering,Sichuan University,College of Cyber Science and Engineering,Sichuan University,College of Cyber Science and Engineering,Sichuan University
Abstract:The malicious abuse of the text generation technology has becoming more and more serious, which makes the detection for generated text considerably important. The existing detection methods mainly rely on statistical anomalous features based on the specific dataset, which leads to the poor generalization ability. Considering the common problems of factual errors and semantic conflicts in the generated text, this paper proposes a generated text detection method based on the factual and semantic consistency. By using the text entity, the proposed method compares the text with the external knowledge base to obtain the factual consistency feature of the text. On the other hand, the text entailment technology is used to infer the semantic relationship between the text above and below to obtain the semantic consistency feature of the text. Finally, the above two types of features are spliced with RoBERTa output hidden vector and input to the linear classification layer for prediction. The experimental results show that the proposed method has higher accuracy and generalization ability than the existing detection methods.
Keywords:
点击此处可从《四川大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《四川大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号