基于回译和集成学习的维汉神经机器翻译方法 Uyghur-Chinese neural machine translation method based on back translation and ensemble learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于回译和集成学习的维汉神经机器翻译方法

引用本文：	冯笑,杨雅婷,董瑞,艾孜麦提·艾尼瓦尔,马博.基于回译和集成学习的维汉神经机器翻译方法[J].兰州理工大学学报,2022,48(5):99.

作者姓名：	冯笑杨雅婷董瑞艾孜麦提·艾尼瓦尔马博

作者单位：	1.中国科学院新疆理化技术研究所, 新疆乌鲁木齐 830011; 2.中国科学院大学, 北京 100049; 3.新疆民族语音处理实验室, 新疆乌鲁木齐 830011

基金项目：	国家自然科学基金(U2003303),新疆高层次引进人才项目(新人社函[2017]699号),中国科学院西部之光人才培养计划A类资助项目(2017-XBQNXZ-A-005),中科院创新青年促进会资助项目(2017472,科发人函字[2019]26号)

摘要：	从高效利用现有资源的角度出发,针对维汉平行语料匮乏导致维汉神经机器翻译效果欠佳的问题,提出一个基于回译和集成学习的方法.首先,利用回译和大规模汉语单语语料构造出维汉伪平行语料,并利用伪平行语料进行训练得到中间模型;其次,使用自助采样法对原始平行语料进行N次重采样,得到N个近似同一分布但具有差异性的子数据集;基于N个子数据集分别对中间模型进行微调,得到N个具有差异性的子模型;最后,将这些子模型集成.在CWMT2015和CWMT2017的测试集上的实验证明,该方法比基线系统的BLEU值分别提升了2.37和1.63.
关键词：	神经机器翻译回译集成学习中间模型微调灾难性遗忘
收稿时间：	2021-04-16
Uyghur-Chinese neural machine translation method based on back translation and ensemble learning

FENG Xiao,YANG Ya-ting,DONG Rui,AZMAT Anwar,MA Bo.Uyghur-Chinese neural machine translation method based on back translation and ensemble learning[J].Journal of Lanzhou University of Technology,2022,48(5):99.

Authors:	FENG Xiao YANG Ya-ting DONG Rui AZMAT Anwar MA Bo

Institution:	1. Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; 2. University of Chinese Academy of Sciences, Beijing 100049, China; 3. Xinjiang Laboratory of Minority Speech and Language Information Processing,Urumqi 830011, China

Abstract:	From the perspective of efficient utilization of existing resources, a method based on back-translation and ensemble learning is proposed to solve the problem of the poor performance of Uyghur-Chinese neural machine translation caused by the lack of parallel corpus. Firstly, Uyghur and Chinese pseudo-parallel corpora are constructed by using back translation and large-scale Chinese monolingual corpora, and the intermediate model is obtained by using pseudo parallel corpora training. Secondly, the bootstrap is used to resample the original parallel corpus for N times, and N sub-datasets with similar distribution but different characteristics are obtained. The intermediate model were fine-tuned based on N sub-data sets, and N sub-models with differences were obtained. Finally, integrate these sub-models. Experiments on the test sets of CWMT2015 and CWMT2017 show that theBLEU(Bilingual Evaluation Understudy) value of this method are 2.37 and 1.63 higher than that of the baseline system, respectively.

Keywords:	neural machine translation back translation ensemble learning intermediate model fine tuning catastrophic forgetting

	点击此处可从《兰州理工大学学报》浏览原始摘要信息
	点击此处可从《兰州理工大学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏