首页 | 本学科首页   官方微博 | 高级检索  
     检索      

实体类别信息增强的命名实体识别算法
引用本文:刘明辉,唐望径,许斌,仝美涵,王黎明,钟琦,徐剑军.实体类别信息增强的命名实体识别算法[J].应用科学学报,2023,41(1):1-9.
作者姓名:刘明辉  唐望径  许斌  仝美涵  王黎明  钟琦  徐剑军
作者单位:1. 清华大学 计算机科学与技术系, 北京 100084;2. 中国科普研究所, 北京 100081;3. 北京彩智科技有限公司, 北京 100081
基金项目:中国科普研究所合作项目基金(No.200110EMR028)资助
摘    要:中文命名实体识别(named entity recognition, NER)字符级别模型会忽略句子中词语的信息,为此提出了一种基于知识图谱中实体类别信息增强的中文NER方法。首先,使用分词工具对训练集进行分词,选出所有可能的词语构建词表;其次,利用通用知识图谱检索词表中实体的类别信息,并以简单有效的方式构建与字符相关的词集,根据词集中实体对应的类别信息生成实体类别信息集合;最后,采用词嵌入的方法将类别信息的集合转换成嵌入与字符嵌入拼接,以此丰富嵌入层生成的特征。所提出的方法可以作为嵌入层扩充特征多样性的模块使用,也可与多种编码器-解码器的模型结合使用。在微软亚洲研究院提出的中文NER数据集上的实验展现了该模型的优越性,相较于双向长短期记忆网络与双向长短期记忆网络+条件随机场模型,在评价指标F1上分别提升了11.00%与3.09%,从而验证了知识图谱中实体的类别信息对中文NER增强的有效性。

关 键 词:命名实体识别  知识图谱  实体类别信息  知识增强
收稿时间:2022-06-30

Named Entity Recognition Algorithm Enhanced with Entity Category Information
LIU Minghui,TANG Wangjing,XU Bin,TONG Meihan,WANG Liming,ZHONG Qi,XU Jianjun.Named Entity Recognition Algorithm Enhanced with Entity Category Information[J].Journal of Applied Sciences,2023,41(1):1-9.
Authors:LIU Minghui  TANG Wangjing  XU Bin  TONG Meihan  WANG Liming  ZHONG Qi  XU Jianjun
Institution:1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;2. China Research Institute for Science Popularization, Beijing 100081, China;3. Beijing Caizhi Technology Co., Ltd., Beijing 100081, China
Abstract:To solve the problem that the character level model of Chinese named entity recognition (NER) may ignore word information in sentences, a Chinese NER method based on entity category information enhancement in knowledge graph was proposed. Firstly, a training set was segmented with word segmentation tool, and all possible words were selected to construct a vocabulary. Secondly, the category information of entities in the vocabulary was retrieved by using generic knowledge graph, to construct a word set related to characters in a simple and effective way, and an entity category information set is generated according to the category information of entities in the word set. Finally, word embedding method was used to convert the set of category information into embeddings and concatenation of character embeddings, so as to enrich features in embedding layer. The proposed method can either be used as a module to expand feature diversity of embedding layer, or jointly applies with a variety of encoder-decoder models. Experiments on the Chinese NER dataset proposed by Microsoft Research Asia (MSRA) show the superiority of the proposed model. Compared with the models of Bi-directional long short-term memory (Bi-LSTM) and Bi-LSTM plus with conditional random field (CRF), the proposed method increases F1 by 11.00% and 3.09% respectively, verifying that the category information of entities in knowledge graph performs high effectiveness in the enhancement of Chinese NER.
Keywords:named entity recognition (NER)  knowledge graph  entity category information  knowledge enhancement  
点击此处可从《应用科学学报》浏览原始摘要信息
点击此处可从《应用科学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号