首页 | 本学科首页   官方微博 | 高级检索  
     

基于MacBERT-BiLSTM-CRF的反恐领域细粒度实体识别
引用本文:焦凯楠,李欣,叶瀚,朱容辰,孙海春. 基于MacBERT-BiLSTM-CRF的反恐领域细粒度实体识别[J]. 科学技术与工程, 2021, 21(29): 12638-12648
作者姓名:焦凯楠  李欣  叶瀚  朱容辰  孙海春
作者单位:中国人民公安大学
基金项目:公安部技术研究计划(2020JSYJC22ok) ;国家高技术研究发展技术(863计划)(2015AA016009);中国人民公安大学基本科研业务费项目(2021JKF215)
摘    要:
为验证基于深度学习的命名实体识别框架在反恐领域的有效性,参照ACE 2005实体标注规范,制订了细粒度反恐实体标签体系,构建了反恐实体语料集Anti-Terr-Corpus;提出基于MacBERT-BiLSTM-CRF的实体识别模型,通过能减少预训练和微调阶段差异的MacBERT(masked language modeling as correction bidirectional encoder representations from transformers)预训练语言模型获得动态字向量表达,送入双向长短时记忆(bidirectional long short-term memory, BiLSTM)和条件随机场(conditional random field, CRF)进行上下文特征编码和解码得到最佳实体标签;替换框架中的预训练语言模型进行对比实验。实验表明该模型可以有效获取反恐新闻中的重要实体。对比BiLSTM-CRF模型,MacBERT的加入提高了24.5%的F_1值;保持编码-解码层为BiLSTM-CRF时,加入MacBERT比加入ALBERT(a lite BERT)提高了5.1%的F_1值。可见,深度学习有利于反恐领域实体识别,能够利用公开反恐新闻文本为后续反恐形势预判服务,同时有助于反恐领域信息提取、知识图谱构建等基础性任务。

关 键 词:深度学习   预训练语言模型   反恐领域实体识别   细粒度实体识别
收稿时间:2021-05-07
修稿时间:2021-09-15

Fine-grained Entity Recognition based on MacBERT-BiLSTM-CRF in Anti-Terrorism Field
Jiao Kainan,Li Xin,Ye Han,Zhu Rongchen,Sun Haichun. Fine-grained Entity Recognition based on MacBERT-BiLSTM-CRF in Anti-Terrorism Field[J]. Science Technology and Engineering, 2021, 21(29): 12638-12648
Authors:Jiao Kainan  Li Xin  Ye Han  Zhu Rongchen  Sun Haichun
Affiliation:people''s public security university of China(PPSUC)
Abstract:
In order to verify the effectiveness of the named entity recognition framework based on deep learning in anti-terrorism field, a fine-grained anti-terrorism entity label system was developed referring to the ACE 2005 entity labeling specification to construct the tailored corpus called Anti-Terr-Corpus. The entity recognition model based on MacBERT-BILSTM-CRF was proposed. The dynamic word vector expression obtained by MacBERT, a kind of pretrained language models, could reduce the difference between pre-training and fine-tuning stages and was sent to BILSTM and CRF for context feature encoding and decoding to obtain the optimal entity tags. Replaced the pretrained language model in the framework to carry out the comparative experiments. Experiments show that this model can effectively obtain important entities in anti-terrorism news. Compared with the BILSTM-CRF model, the addition of MacBERT increased the F-value by 24.5%. When the encode-decoding layer is kept as BILSTM-CRF, the F-value of MacBERT is increased by 5.1% compared with that of ALBERT. It is concluded that deep learning contributes to entity identification in anti-terrorism field. The open anti-terrorism news can be used to serve for subsequent anti-terrorism situation prediction, being helpful for basic tasks such as information extraction and knowledge map construction in anti-terrorism field.
Keywords:Deep learning   Pretrained language model   entity recognition in anti-terrorism field   Fine-grained entity recognition
本文献已被 CNKI 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号