首页 | 本学科首页   官方微博 | 高级检索  
     

面向维汉神经机器翻译的双向重排序模型分析
引用本文:张新路,李晓,杨雅婷,王磊,董瑞. 面向维汉神经机器翻译的双向重排序模型分析[J]. 北京大学学报(自然科学版), 2020, 56(1): 31-38. DOI: 10.13209/j.0479-8023.2019.093
作者姓名:张新路  李晓  杨雅婷  王磊  董瑞
作者单位:1. 中国科学院新疆理化技术研究所, 乌鲁木齐 8300112. 中国科学院大学, 北京 1000493. 新疆民族语音语言信息处理实验室, 乌鲁木齐 830011
基金项目:新疆维吾尔自治区重点实验室开放课题(2018D04018)、国家自然科学基金(U1703133)、中国科学院青年创新促进会项目(2017472)和新疆维吾尔自治区高层次人才引进工程项目(Y839031201)资助
摘    要:在维吾尔语到汉语等低资源语料库上, 神经机器翻译的拟合训练容易陷入局部最优解, 导致单一模型的翻译结果可能不是全局最优解。针对此问题, 通过集成策略, 有效整合多个模型预测的概率分布, 将多个翻译模型作为一个整体; 同时采用基于交叉熵的重排序方法, 将具有相反解码方向的翻译模型相结合, 最终选出综合得分最高的候选翻译作为输出。在CWMT2015维汉平行语料上的实验结果表明, 与单一的Transformer模型相比, 改进后的方法提升4.82个BLEU值。

关 键 词:神经机器翻译  集成学习  双向重排序  维吾尔语  
收稿时间:2019-06-02

Analysis of Bi-directional Reranking Model for Uyghur-Chinese Neural Machine Translation
ZHANG Xinlu,LI Xiao,YANG Yating,WANG Lei,DONG Rui. Analysis of Bi-directional Reranking Model for Uyghur-Chinese Neural Machine Translation[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020, 56(1): 31-38. DOI: 10.13209/j.0479-8023.2019.093
Authors:ZHANG Xinlu  LI Xiao  YANG Yating  WANG Lei  DONG Rui
Affiliation:1. Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 8300112. University of Chinese Academy of Sciences, Beijing 1000493. Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011
Abstract:The fitting training of neural machine translation is easy to fall into a local optimal solution on a lowresource corpus such as Uyghur to Chinese, resulting in the translation result of a single model may not be a global optimal solution. In order to solve this problem, the probability distribution predicted by multiple models is effectively integrated through the ensemble strategy, and multiple translation models are taken as a whole. At the same time, the translation models with opposite decoding directions are integrated by the reordering method based on cross entropy, and the candidate translation with the highest comprehensive score is selected as the output. The experiment on CWMT2015 Uighur-Chinese parallel corpus shows that proposed method has 4.82 BLEU values improvement compared with a single transformer model.
Keywords:neural machine translation  ensemble learning  bi-directional reranking  Uyghur  
本文献已被 CNKI 等数据库收录!
点击此处可从《北京大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号