首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于中英文单语术语库的双语术语对齐方法
引用本文:向露,周玉,宗成庆.基于中英文单语术语库的双语术语对齐方法[J].中国科技术语,2022,24(1):14-25.
作者姓名:向露  周玉  宗成庆
作者单位:1.中国科学院自动化研究所模式识别国家重点实验室,北京 1001902.中国科学院大学人工智能学院,北京 1000493.凡语AI研究院/北京中科凡语科技有限公司,北京 100080
摘    要:双语术语对齐库是自然语言处理领域的重要资源,对于跨语言信息检索、机器翻译等多语言应用具有重要意义。双语术语对通常是通过人工翻译或从双语平行语料中自动提取获得的。然而,人工翻译需要一定的专业知识且耗时耗力,而特定领域的双语平行语料也很难具有较大规模。但是同一领域中各种语言的单语术语库却较易获得。为此,提出一种基于两种不同语言的单语术语库自动实现术语对齐,以构建双语术语对照表的方法。该方法首先利用多个在线机器翻译引擎通过投票机制生成目标端“伪”术语,然后利用目标端“伪”术语从目标端术语库中检索得到目标端术语候选集合,最后采用基于mBERT的语义匹配算法对目标端候选集合进行重排序,从而获得最终的双语术语对。计算机科学、土木工程和医学三个领域的中英文双语术语对齐实验结果表明,该方法能够提高双语术语抽取的准确率。

关 键 词:双语术语  单语术语库  术语对齐  语义匹配  
收稿时间:2021-07-30
修稿时间:2021-10-09

Bilingual Terminology Alignment Based on Chinese-English Monolingual Terminological Bank
XIANG Lu,ZHOU Yu,ZONG Chengqing.Bilingual Terminology Alignment Based on Chinese-English Monolingual Terminological Bank[J].Chinese Science and Technology Terms Journal,2022,24(1):14-25.
Authors:XIANG Lu  ZHOU Yu  ZONG Chengqing
Abstract:Bilingual terminologies are essential resources in natural language processing,which are of great significance for many multilingual applications such as cross-lingual information retrieval and machine translation.Bilingual terminology pairs are usually obtained by either human translation or automatic extraction from a bilingual parallel corpus.However,human translation requires professional knowledge and is time-consuming and labor-intensive.Besides,it is not easy to have a large bilingual parallel corpus in a specific domain.But the monolingual terminology banks of various languages in the same domain are relatively easy to obtain.Therefore,this paper proposes a novel method to extract bilingual terminology pairs by automatically aligning terms from monolingual terminology banks of two languages.Firstly,multiple online machine translation engines are adopted to generate the target pseudo terminology through a voting mechanism.Secondly,the target pseudo terminology is used to retrieve from the target terminology bank to obtain the candidate set of target terminologies.Finally,a mBERT-based semantic matching model is used to re-rank the candidate set and obtain the final bilingual terminology pair.Experimental results of Chinese-English bilingual terminology alignment on three domains,including computer science,civil engineering,and medicine,show that our proposed method can effectively improve the accuracy of bilingual terminology extraction.
Keywords:bilingual terminology  monolingual terminological bank  terminology alignment  semantic matching
本文献已被 维普 等数据库收录!
点击此处可从《中国科技术语》浏览原始摘要信息
点击此处可从《中国科技术语》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号