首页 | 本学科首页   官方微博 | 高级检索  
     检索      

改进的基于Transformer的双向编码器的对话文本识别
引用本文:张杨帆,丁锰.改进的基于Transformer的双向编码器的对话文本识别[J].科学技术与工程,2022,22(29):12945-12953.
作者姓名:张杨帆  丁锰
作者单位:中国人民公安大学侦查学院
基金项目:中国人民公安大学公共安全行为科学实验室开放课题(2021SYS07)
摘    要:利用文本分析技术可以帮助民警快速地准确地提取电子数据,使用预训练语言模型进行下游任务能够有效减轻过拟合,在使用微调BERT进行文本分类的时候一般将CLS]位对应的隐含层表示作为句向量输入到全连接层中进行分类,这会产生一部分语义信息的丢失从而导致分类准确率的下降。针对这一问题,提出在BERT后接一个语义特征提取器以充分利用高层的语义信息,使用不同大小的二维卷积核对BERT输出的隐藏状态进行卷积,然后用共享权重的Squeeze-and-Excitation模块对通道进行加权,通过最大池化层后连结起来,最后输入到全连接层进行分类。在自建的涉案对话文本数据集和公开数据集THUCNews上进行测试,结果表明,与BERT基线模型和其他分类模型相比,改进后的微调BERT模型具有更好的分类效果。

关 键 词:电子数据取证  ?  文本分类  ?  对话文本  ?  BERT
收稿时间:2022/1/15 0:00:00
修稿时间:2022/9/21 0:00:00

Research on Involved Text Recognition Based on Improved BERT
Zhang Yangfan,Ding Meng.Research on Involved Text Recognition Based on Improved BERT[J].Science Technology and Engineering,2022,22(29):12945-12953.
Authors:Zhang Yangfan  Ding Meng
Institution:Department of Criminal Investigation, People''s Public Security University of China
Abstract:Using text analysis technology can help police extract electronic data quickly and accurately. Using pre training language model for downstream tasks can effectively reduce over fitting. When using fine-tuning Bert for text classification, the hidden layer representation corre-sponding to CLS] bit is generally input into the full connection layer as a sentence direction for classification, This will result in the loss of some semantic information, resulting in the decline of classification accuracy. To solve this problem, it is proposed to connect a semantic feature ex-tractor after Bert to make full use of the high-level semantic information, use two-dimensional convolution of different sizes to check the hidden state of Bert output for convolution, then use the sequence and exception module with shared weight to add weight to the channels, connect them through the max-pooling layer, and finally input them to the full connection layer for classification. The results testing on the self built dialogue text dataset and public dataset thu-cnews showed that the improved fine-tuning Bert model has better classification than the Bert baseline model and other classification models.
Keywords:Digital Forensics  text classification  dialogue text  BERT
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号