首页 | 本学科首页   官方微博 | 高级检索  
     

基于小规模手写体汉字数据集的数据增强方法
引用本文:沈强,李辉,张燕. 基于小规模手写体汉字数据集的数据增强方法[J]. 北京化工大学学报(自然科学版), 2021, 48(1): 58-65. DOI: 10.13543/j.bhxbzr.2021.01.008
作者姓名:沈强  李辉  张燕
作者单位:1. 北京化工大学 信息科学与技术学院, 北京 100029;2. 北京市电气工程学校, 北京 100123
摘    要:
针对深度卷积生成对抗网络(DCGAN)在小规模手写体汉字数据集下生成数据重复多样、分类效果较差的问题,提出结合传统数据增强方法的结合式生成方法X-DCGAN。该方法通过预增强模块给予神经网络部分更充足多样的训练数据,减少因网络过拟合与训练不充分而出现的样本重复率高、学习效果较差的状况。实验结果表明,本文方法生成的样本数据较单一方法在样本多样性方面显著提高,生成数据进行分类测试时获得的平均识别率较DCGAN方法提升了9.67%。X-DCGAN充分发挥了传统数据增强方法和生成式方法各自的优势,能够更加有效地解决小规模数据集的扩展与增强问题。

关 键 词:数据增强  深度卷积生成对抗网络  手写体汉字识别  图像处理  
收稿时间:2020-03-31

A data augmentation method based on a small-scale handwritten Chinese character dataset
SHEN Qiang,LI Hui,ZHANG Yan. A data augmentation method based on a small-scale handwritten Chinese character dataset[J]. Journal of Beijing University of Chemical Technology, 2021, 48(1): 58-65. DOI: 10.13543/j.bhxbzr.2021.01.008
Authors:SHEN Qiang  LI Hui  ZHANG Yan
Affiliation:1. College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029;2. Beijing Electrical Engineering School, Beijing 100123, China
Abstract:
In order to improve the poor classification performance of a deep convolutional generative adversarial network(DCGAN) when using a small-scale handwritten Chinese character dataset, a generative method X-DCGAN combined with traditional data augmentation methods is proposed in this work. This method provides more diverse training data for the neural network through the pre-enhancement module, addressing the problem of high sample repetition rate and poor learning effect due to overfitting and insufficient training of the network. Tests showed that the sample data generated by this method have been significantly improved in sample diversity when compared with a single method. In addition, the accuracy obtained by the generating data for classification testing improved by 9.67%. X-DCGAN makes full use of the advantages of traditional augmentation methods and generative methods, and thus can effectively solve the problems of expansion and enhancement of a small-scale dataset.
Keywords:data augmentation   deep convolutional generative adversarial networks   handwritten Chinese character recognition   image processing
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京化工大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京化工大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号