首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于BERT的诉讼案件违法事实要素自动抽取
引用本文:崔斌,邹蕾,徐明月.基于BERT的诉讼案件违法事实要素自动抽取[J].科学技术与工程,2021,21(9):3669-3675.
作者姓名:崔斌  邹蕾  徐明月
作者单位:北京京航计算通讯研究所信息工程事业部,北京 100074
摘    要:针对诉讼案件违法事实要素抽取效果依赖领域专业知识的特点,提出一种基于transformer双向编码器表示(bidirec-tional encoder representations from transformer,BERT)的诉讼案件违法事实要素自动抽取方法.首先,通过构建领域知识并采用谷歌BERT预训练语言模型进行训练得到拟合诉讼案件领域数据的模型参数和中文预训练字嵌入向量作为模型的输入,得到具有上下文相关的语义表示,以提高词嵌入的上下文语义质量.其次,采用循环卷积神经网络对文本进行编码并获取在文本分类任务中扮演关键角色的信息,提升案件违法事实要素抽取的效果.最后,采用focal函数作为损失函数关注难以区分的样本.违法事实要素抽取的工作是通过对文本标签进行分类得到的.实验测试表明,该方法对诉讼案件要素抽取的F1值为86.41%,相比其他方法性能均有提高.对模型注入领域内知识,也可以提高模型抽取准确率.

关 键 词:诉讼案件  违法事实要素  BERT  预训练  领域内知识
收稿时间:2020/7/1 0:00:00
修稿时间:2021/1/12 0:00:00

Automatic illegal fact extraction of lawsuit case based on BERT
Cui Bin,Zou Lei,Xu Mingyue.Automatic illegal fact extraction of lawsuit case based on BERT[J].Science Technology and Engineering,2021,21(9):3669-3675.
Authors:Cui Bin  Zou Lei  Xu Mingyue
Institution:Beijing Jinghang Institute of computing and communication
Abstract:Aiming at the fact that the extraction of illegal fact elements in lawsuit cases depends on special professional knowledge, an automatic illegal fact elements extraction of lawsuit cases based on BERT is proposed. First, by constructing domain knowledge and using Google BERT pre-training language model for training, model parameters fitting the domain data of lawsuit cases and embedding vector of Chinese pre-training words are obtained as the input of the model, the contextual representation is obtained to improve the quality of the context semantic of word embedding, and then the text is encoded by the cyclic convolutional neural network and the information that plays a key role in the text classification task is obtained. Finally, focal function is adopted as the loss function to focus on the indistinguishable samples. The work of extracting elements of illegal facts is obtained by classifying text labels. Experimental tests show that the F1 value of the method is 86.41%, which is better than other methods. The accuracy of model extraction can also be improved by injecting domain knowledge into the model.
Keywords:lawsuit cases  illegal fact elements  bert  pre-training  domain knowledge
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号