首页 | 本学科首页   官方微博 | 高级检索  
     

融合笔画特征的命名实体识别方法
引用本文:蒋丽媛,吴亚东,王书航,张巍瀚,李懿. 融合笔画特征的命名实体识别方法[J]. 科学技术与工程, 2023, 23(17): 7436-7443
作者姓名:蒋丽媛  吴亚东  王书航  张巍瀚  李懿
作者单位:四川轻化工大学
基金项目:四川轻化工大学人才引进项目(2020RC20)
摘    要:
汉字是象形文字,其字形特征对于中文命名实体识别有着重要的作用。针对双向长短期记忆模型(bi-directional long short-term memory,BiLSTM)提取部首,命名实体识别准确率不高的问题,提出笔画组成编码器,用于获取汉字的字形特征,并将笔画字形特征向量和预训练的语言表征模型(bidirectional encoder representation from transformers,BERT)输出的字向量进行拼接,将拼接后的向量放入双向长短期记忆模型与条件随机场(conditional random field,CRF)相连的标注模型(BiLSTM-CRF)中进行命名实体识别。实验表明,所提的方法在Resume数据集上命名实体识别准确率有显著提升。相较于用卷积神经网络做编码器提取汉字字形特征,准确率高出0.4%。相较于使用BiLSTM提取的部首特征模型和加入词典的长短期记忆模型(Lattice LSTM)模型其准确率分别提升了4.2%、0.8%。

关 键 词:字形特征;中文命名实体识别;BiLSTM-CRF;笔画组成编码器;动态词向量
收稿时间:2022-06-07
修稿时间:2023-06-09

A named entity recognition method incorporating stroke features
Jiang Liyuan,Wu Yadong,Wang Shuhang,Zhang Weihan,Li Yi. A named entity recognition method incorporating stroke features[J]. Science Technology and Engineering, 2023, 23(17): 7436-7443
Authors:Jiang Liyuan  Wu Yadong  Wang Shuhang  Zhang Weihan  Li Yi
Affiliation:Sichuan University of Science and Engineering
Abstract:
Chinese characters are pictographs, and their character features play an important role in the recognition of Chinese named entities. To address the problem that the bi-directional long short-term memory model (BiLSTM) extracts radicals and the recognition accuracy of named entities is not high, a stroke composition encoder is proposed to obtain the character features of Chinese characters, and the vector of stroke character features and the pre-trained language representation model (bidirectional encoder representation from transformers, BERT) are stitched together. The stroke-character feature vectors are stitched together with the word vectors from the pre-trained BERT, and the stitched vectors are put into a bi-directional long and short-term memory model linked to a conditional random field (CRF) annotation model (BiLSTM-CRF) for named entity recognition. Experiments show that the proposed method has significantly improved the accuracy of named entity recognition on the Resume dataset. Compared with using convolutional neural networks as encoders to extract Chinese character features, the accuracy is 0.4% higher. The accuracy is 4.2% and 0.8% higher than that of the BiLSTM extracted radical feature model and the Lattice LSTM model with the addition of a lexicon, respectively.
Keywords:characteristic features   Chinese named entity recognition   BiLSTM-CRF   stroke composition encoder   dynamic word vectors
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号