首页 | 本学科首页   官方微博 | 高级检索  
     检索      

面向满文字符识别的训练数据增广方法研究
引用本文:毕佳晶,李敏,郑蕊蕊,许爽,贺建军,黄荻.面向满文字符识别的训练数据增广方法研究[J].大连民族学院学报,2018,20(1):73-78.
作者姓名:毕佳晶  李敏  郑蕊蕊  许爽  贺建军  黄荻
作者单位:1.大连民族大学 信息与通信工程学院,辽宁 大连 116605;
2. 北方民族大学 数学与信息科学学院,宁夏 银川,750021
基金项目:国家自然科学基金项目(61702081);辽宁省自然科学基金指导计划(201602205;2015020084;L2015127);中央高校基本科研业务费专项资金资助项目(DC201502060202;DC201502060407;DC201502060301);大连市青年科技之星项目(2016RQ072)
摘    要:为解决采用深度学习方法研究满文识别中训练样本匮乏的问题,提出一种使用数据增广方法扩展训练样本集的技术框架。该框架包括字体几何结构变形与图像质量变换两个模块,采用仿射变换、弹性形变等9种数据生成方法,分别模拟满文字符图像的笔画粗细变化、扭曲变形、光照不均、不同视角及背景等情况下的采集效果。在满文识别的研究中,采用该方法将每个类别的字符数据量扩展到7万个。实验表明,该方法生成的数据在一定程度上弥补了训练样本不足的问题,是解决训练样本匮乏问题的有效技术手段。

关 键 词:光学字符识别  满文识别  数据增广  数据生成  

Research on Training Data Augmentation Methods for Manchu Character Recognition
BI Jia-jing,LI Min,ZHENG Rui-rui,HE Jian-jun,HUANG Di.Research on Training Data Augmentation Methods for Manchu Character Recognition[J].Journal of Dalian Nationalities University,2018,20(1):73-78.
Authors:BI Jia-jing  LI Min  ZHENG Rui-rui  HE Jian-jun  HUANG Di
Institution:1. School of Information and Communication Engineering, Dalian Minzu University, Dalian Liaoning 116605, China;
2. School of Mathematics and Information Science, North Minzu University, Yinchuan Ningxia 750021, China
Abstract:In order to solve the insufficient training data problem on Manchu character recognition using deep learning method, this paper proposed a technical framework to expand training data using data augmentation methods. The framework consists of two modules: character structure distortion and image quality transformation. There are 9 synthetic data generating methods in the framework, e.g. affine transformation, elastic deformation and so on, which simulate various effects during Manchu word images collection respectively, such as stroke thickness variation, font distortion, uneven illumination, different perspectives and backgrounds. For each class of Manchu word, we gained 70,000 synthetic samples via the data augmentation framework for a study on Manchu word recognition. Experiments demonstrate that, to a certain degree, the synthetic data yielded with the proposed data augmentation framework can expand training set. The proposed data augmentation methods are also effective ways to solve insufficient training data problem.
Keywords:optical character recognition  Manchu recognition  data augmentation  synthetic data  
点击此处可从《大连民族学院学报》浏览原始摘要信息
点击此处可从《大连民族学院学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号