首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于自注意力机制的科技术语自动提取技术研究
引用本文:赵颂歌,张浩,常宝宝.基于自注意力机制的科技术语自动提取技术研究[J].中国科技术语,2021,23(2):20-26.
作者姓名:赵颂歌  张浩  常宝宝
作者单位:1.北京大学信息科学技术学院计算语言学研究所,北京 1008712.北京大学软件与微电子学院,北京 102600
基金项目:全国科学技术名词审定委员会科研项目"基于深度学习的科技术语提取技术研究";国家自然科学基金项目"基于深度学习的数据-文本生成技术研究"
摘    要:科技术语提取是科技术语自动处理的重要环节,对后续的机器翻译、信息检索、QA问答等任务有重要意义。传统的人工科技术语提取方法耗费大量的人力成本。而一种自动提取科技术语方法是将术语提取转化为序列标注问题,通过监督学习方法训练出标注模型,但是面临缺乏大规模科技术语标注语料库的问题。文章引入远程监督的方法来产生大规模训练标注语料。另外又提出基于自注意力机制的Bi-LSTM的模型架构来提高科技术语提取结果。发现新模型在发现新的科技术语的能力上远远优于传统机器学习模型(CRF)。

关 键 词:科技术语提取  远程监督  自注意力  
收稿时间:2020-12-16

Research on Automatic Extraction of Scientific Terminology from Texts Based on Self-Attention
ZHAO Songge,ZHANG Hao,CHANG Baobao.Research on Automatic Extraction of Scientific Terminology from Texts Based on Self-Attention[J].Chinese Science and Technology Terms Journal,2021,23(2):20-26.
Authors:ZHAO Songge  ZHANG Hao  CHANG Baobao
Abstract:Scientific terminology uses specific words to represent certain scientific concepts.The extraction of scientific terminology is an important part of the automatic processing of scientific terminology,and it is of great significance for the following tasks such as machine translation,information retrieval,and questions and answers.The traditional extraction of scientific terminology consumes a lot of manpower cost,and an automatic method for extracting scientific terminology is transforming terminology extraction into tagging problem and training out the tagging model through supervised learning methods,while the lack of annotated large-scale scientific terminology corpus is the problem.This paper introduces the method of distant supervision to generate large-scale annotated training corpus,and proposes Bi-LSTM model architecture based on Self-attention mechanism for the purpose of improving the extraction results of scientific terminology.We found that the ability of discovering new scientific terminology about our new model is far superior to the traditional machine learning model(CRF).
Keywords:the extraction of scientific terminology  distant supervision  self-attention
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中国科技术语》浏览原始摘要信息
点击此处可从《中国科技术语》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号