首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于改进损失函数的实体类别平衡优化算法
引用本文:张俸玺,吴丞楚,张运泽,董洛兵.基于改进损失函数的实体类别平衡优化算法[J].广西科学,2023,30(1):100-105.
作者姓名:张俸玺  吴丞楚  张运泽  董洛兵
作者单位:西安电子科技大学通信工程学院, 陕西西安 710071;西安电子科技大学计算机科学与技术学院, 陕西西安 710071
基金项目:国家级大学生创新创业训练计划项目(202110701085)资助。
摘    要:针对自然语言处理(Natural Language Processing, NLP)任务中,命名实体识别(Named Entity Recognition, NER)存在实体类别样本不平衡的问题,提出一种基于改进损失函数的实体类别平衡优化算法。新算法是对神经网络模型中的损失函数进行优化处理,通过分析命名实体识别数据特点,在平衡正负样本的基础上引入平滑系数和权重系数,保证模型在梯度传递的过程更关注于实体类别较少和带有嵌套的难识别样本,同时减少对样本数较多的、易识别样本的关注。利用公共数据集ACE05、MSRA进行实验对比,结果表明改进的损失函数在数据集ACE05和MSRA上,F1值分别提高1.53%和0.91%。上述结果表明改进的损失函数能够较好地缓解实体中正负难易样本的不平衡。

关 键 词:自然语言处理  命名实体识别  损失函数  平滑系数  神经网络  难易样本

Entity Category Balance Optimization Algorithm Based on Improved Loss Function
ZHANG Fengxi,WU Chengchu,ZHANG Yunze,DONG Luobing.Entity Category Balance Optimization Algorithm Based on Improved Loss Function[J].Guangxi Sciences,2023,30(1):100-105.
Authors:ZHANG Fengxi  WU Chengchu  ZHANG Yunze  DONG Luobing
Institution:School of Telecommunications Engineering, Xidian University, Xi''an, Shaanxi, 710071, China; College of Computer Science & Technology, Xidian University, Xi''an, Shaanxi, 710071, China
Abstract:Aiming at the problem of unbalanced entity category samples in Named Entity Recognition (NER) in Natural Language Processing (NLP) tasks,an entity category balance optimization algorithm based on improved loss function is proposed.The new algorithm is to optimize the loss function in the neural network model.By analyzing the characteristics of named entity recognition data,the smoothing coefficient and the weight coefficient are introduced on the basis of balancing the positive and negative samples to ensure that the model pays more attention to the difficult recognition samples with fewer entity categories and nesting in the process of gradient transfer,while reducing the focus on easy-to-identify samples with more samples.Using the public datasets ACE05 and MSRA for experimental comparison,the results show that the improved loss function is on the data sets ACE05 and MSRA,and F1 value increases by 1.53% and 0.91%,respectively.The above results show that the improved loss function can better alleviate the imbalance of positive and negative difficult and easy samples in the entity.
Keywords:natural language processing|named entity recognition|loss function|smoothing coefficient|neural networks|difficult and easy examples
点击此处可从《广西科学》浏览原始摘要信息
点击此处可从《广西科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号