首页 | 官方网站   微博 | 高级检索  
     

结合上下文词汇匹配和图卷积的材料数据命名实体识别
引用本文:陈茜,武星.结合上下文词汇匹配和图卷积的材料数据命名实体识别[J].上海大学学报(自然科学版),2021,28(3):372-385.
作者姓名:陈茜  武星
作者单位:1.上海大学 计算机工程与科学学院, 上海 200444;2.上海大学 材料基因组工程研究院 材料信息与数据科学中心,上海 200444;3.之江实验室, 浙江 杭州 311100
基金项目:国家重点研发计划资助项目(2018YFB0704400);云南省重大科技专项资助项目(202102AB080019-3);云南省重大科技专项资助项目(202002AB080001-2);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)
摘    要:材料领域的文献中蕴含着丰富的知识, 利用机器学习和自然语言处理等手段对文献进行数据挖掘是研究热点. 命名实体识别(named entity recognition, NER)是高效利用挖掘和抽取数据中信息的首要步骤. 为了解决现有实体识别方法中存在的向量表示无法解决一词多义、模型常提取上下文特征而忽略全局特征等问题, 提出了一种基于上下文词汇匹配和图卷积命名实体识别方法. 该方法首先利用 XLNet 获取文本的上下文动态特征, 其次利用长短期记忆网络并结合文本上下文匹配词汇的图卷积神经网络(graph convolutional network, GCN)模型分别获取上下文特征与全局特征, 最终经过条件随机场输出标签序列. 2 种不同语料对模型进行验证的结果表明, 该方法在材料数据集上的精确率、召回率和 F1 值分别达到 90.05%、88.67% 和 89.36%, 可有效提升命名实体识别的准确率.

关 键 词:命名实体识别  XLNet  图卷积神经网络  
收稿时间:2022-03-15

Material data named entity recognition based on matching contextual lexical words and graph convolution
CHEN Qian,WU Xing.Material data named entity recognition based on matching contextual lexical words and graph convolution[J].Journal of Shanghai University(Natural Science),2021,28(3):372-385.
Authors:CHEN Qian  WU Xing
Affiliation:1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China;2. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China;3. Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China
Abstract:Literature pertaining to materials contain abundant information regarding data mining using machine learning and natural language processing, which is currently being investigated extensively. Named entity recognition (NER) is first performed when mining and extracting information from data such that the data can be used efficiently. As vector representation cannot solve multiple meanings of words, and models often extract contextual features while disregarding global features, a named entity recognition method based on matching contextual lexical words and graph convolution is proposed herein. First, the contextual dynamic features of text is obtained using XLNet; second, the contextual and global features are obtained using a long short-term memory network and a graph convolutional network (GCN) combined with contextual lexical words of the text, respectively. Finally, a sequence of labels is output via a conditional random field. The model is validated using two different datasets. Experimental results of the material data show that the precision, recall, and F1 score are 90.05%, 88.67%, and 89.36%, respectively, which effectively improve the named entity recognition accuracy.
Keywords:named entity recognition (NER)  XLNet  graph convolutional network (GCN)  
点击此处可从《上海大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《上海大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号