首页 | 本学科首页   官方微博 | 高级检索  
     

东盟十国新闻文本的命名实体识别研究
引用本文:郑彦斌,夏志超,郭智,黄永忠,刘文芬. 东盟十国新闻文本的命名实体识别研究[J]. 科学技术与工程, 2018, 18(35)
作者姓名:郑彦斌  夏志超  郭智  黄永忠  刘文芬
作者单位:桂林电子科技大学,桂林电子科技大学,桂林电子科技大学,桂林电子科技大学,桂林电子科技大学
基金项目:国家自然科学基金项目;广西省自然科学基金;广西省密码学与信息安全重点实验室项目
摘    要:为构建东盟十国知识图谱,需要对相关文本进行命名实体识别工作。设计一种基于双向GRU-CRF的神经网络模型对中国驻东盟十国大使馆中文新闻数据进行命名实体识别。以预训练的领域词向量为输入,利用双向GRU网络从向量化的文本中提取语义特征,再通过CRF层预测并输出最优标签序列。为了进一步改善结果,在双向GRU和CRF层之间添加两层隐藏层。在数据预处理方面,提出一种数据集划分算法对文本进行更加科学合理的划分。在东盟十国数据集上将该模型与几种混合模型进行对比,结果显示所提模型在人名、地名、组织机构名识别任务中拥有更好的识别性能。

关 键 词:双向GRU-CRF 命名实体识别 东盟十国 知识图谱
收稿时间:2018-08-16
修稿时间:2018-10-11

Research on Named Entity Recognition of News Texts in Ten ASEAN Countries
zhengyanbin,xiazhichao,guozhi,huangyongzhong and. Research on Named Entity Recognition of News Texts in Ten ASEAN Countries[J]. Science Technology and Engineering, 2018, 18(35)
Authors:zhengyanbin  xiazhichao  guozhi  huangyongzhong and
Affiliation:Guilin University of Electronic Technology,Guilin University of Electronic Technology,Guilin University of Electronic Technology,Guilin University of Electronic Technology,
Abstract:In order to construct the knowledge graph of the ten ASEAN member states, it is necessary to perform named entity recognition on related texts. A neural network model based on Bi-directional GRU-CRF-based was designed to identify the Chinese news data of the Chinese Embassy in the ten ASEAN member states. Taking the pre-trained domain word vector as input, the Bi-directional GRU network was used to extract the semantic features from the vectorized text, and then the CRF layer was used to predict and output the optimal tag sequence. To further improve the results, two layers of hidden layers were added between the Bi-directional GRU and CRF layers. In the aspect of data preprocessing, a data set partition algorithm was proposed to make the text more scientific and reasonable. Compared with several hybrid models in the ASEAN data set, the models shows that it has better recognition performance in the identification of names of person, location and organizations.
Keywords:bigru-crf named entity recognition ten asean member states knowledge graph
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号