首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于回译和集成学习的维汉神经机器翻译方法
引用本文:冯笑,杨雅婷,董瑞,艾孜麦提·艾尼瓦尔,马博.基于回译和集成学习的维汉神经机器翻译方法[J].兰州理工大学学报,2022,48(5):99.
作者姓名:冯笑  杨雅婷  董瑞  艾孜麦提·艾尼瓦尔  马博
作者单位:1.中国科学院 新疆理化技术研究所, 新疆 乌鲁木齐 830011;
2.中国科学院大学, 北京 100049;
3.新疆民族语音处理实验室, 新疆 乌鲁木齐 830011
基金项目:国家自然科学基金(U2003303),新疆高层次引进人才项目(新人社函[2017]699号),中国科学院西部之光人才培养计划A类资助项目(2017-XBQNXZ-A-005),中科院创新青年促进会资助项目(2017472,科发人函字[2019]26号)
摘    要:从高效利用现有资源的角度出发,针对维汉平行语料匮乏导致维汉神经机器翻译效果欠佳的问题,提出一个基于回译和集成学习的方法.首先,利用回译和大规模汉语单语语料构造出维汉伪平行语料,并利用伪平行语料进行训练得到中间模型;其次,使用自助采样法对原始平行语料进行N次重采样,得到N个近似同一分布但具有差异性的子数据集;基于N个子数据集分别对中间模型进行微调,得到N个具有差异性的子模型;最后,将这些子模型集成.在CWMT2015和CWMT2017的测试集上的实验证明,该方法比基线系统的BLEU值分别提升了2.37和1.63.

关 键 词:神经机器翻译  回译  集成学习  中间模型  微调  灾难性遗忘  
收稿时间:2021-04-16

Uyghur-Chinese neural machine translation method based on back translation and ensemble learning
FENG Xiao,YANG Ya-ting,DONG Rui,AZMAT Anwar,MA Bo.Uyghur-Chinese neural machine translation method based on back translation and ensemble learning[J].Journal of Lanzhou University of Technology,2022,48(5):99.
Authors:FENG Xiao  YANG Ya-ting  DONG Rui  AZMAT Anwar  MA Bo
Institution:1. Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China;
3. Xinjiang Laboratory of Minority Speech and Language Information Processing,Urumqi 830011, China
Abstract:From the perspective of efficient utilization of existing resources, a method based on back-translation and ensemble learning is proposed to solve the problem of the poor performance of Uyghur-Chinese neural machine translation caused by the lack of parallel corpus. Firstly, Uyghur and Chinese pseudo-parallel corpora are constructed by using back translation and large-scale Chinese monolingual corpora, and the intermediate model is obtained by using pseudo parallel corpora training. Secondly, the bootstrap is used to resample the original parallel corpus for N times, and N sub-datasets with similar distribution but different characteristics are obtained. The intermediate model were fine-tuned based on N sub-data sets, and N sub-models with differences were obtained. Finally, integrate these sub-models. Experiments on the test sets of CWMT2015 and CWMT2017 show that theBLEU(Bilingual Evaluation Understudy) value of this method are 2.37 and 1.63 higher than that of the baseline system, respectively.
Keywords:neural machine translation  back translation  ensemble learning  intermediate model  fine tuning  catastrophic forgetting  
点击此处可从《兰州理工大学学报》浏览原始摘要信息
点击此处可从《兰州理工大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号