首页 | 本学科首页   官方微博 | 高级检索  
     

文字识别中特征与相似度度量的研究
引用本文:李杰,方木云. 文字识别中特征与相似度度量的研究[J]. 盐城工学院学报(自然科学版), 2016, 29(4): 42-46
作者姓名:李杰  方木云
作者单位:安徽工业大学 计算机科学与技术学院, 安徽 马鞍山 243002;安徽工业大学 计算机科学与技术学院, 安徽 马鞍山 243002
基金项目:安徽省自然科学基金资助项目(1308085QF113)
摘    要:在大样本测试集下国内现有成熟的OCR识别软件的首位识别准确率为95%~97%之间,在准确率和方法上仍有提升和改进的空间。提出一种基于概率特征和结构特征融合的自适应文字识别算法,模拟人类学习的模式,通过对训练样本的不断学习去构建汉字在测量空间的概率分布矩阵,然后比对原始图像和标准汉字库中汉字的概率分布矩阵的相似度来达到汉字分类的效果。其中相似度度量准则是从矩阵空间的结构和概率2个角度出发去构建的,充分考虑了结构模式识别和统计模式识别的优缺点。实验结果显示算法在训练样本下的首位识别正确率可以达到99.66%,在1 623张非训练样本文字图像下的首位识别正确率可以达到99.13%,在5 515张非训练样本文字图像下的首位识别正确率可以达到98.57%。可以证明提出的相似度度量方法在文字识别中的有效性。

关 键 词:概率特征;结构特征;相似度;文字识别
收稿时间:2016-05-20

Research on Feature and Similarity Measurement in Character Recognition
LI Jie and FANG Muyun. Research on Feature and Similarity Measurement in Character Recognition[J]. Journal of Yancheng Institute of Technology(Natural Science Edition), 2016, 29(4): 42-46
Authors:LI Jie and FANG Muyun
Affiliation:School of Computer Science and Technology, Anhui University of Technology, Maanshan Anhui243002, China;School of Computer Science and Technology, Anhui University of Technology, Maanshan Anhui243002, China
Abstract:In the large sample test set, the first recognition accuracy of the existing mature OCR recognition software is 95%~97%. There is still a space for improvement and improvement in accuracy and method. An adaptive character recognition algorithm based on the fusion of probability feature and structure feature is proposed. By simulating the model of human learning, we construct the probability distribution matrix of Chinese characters in the measurement space through continuous learning of training samples, and then compare the similarity between the original image and the probability distribution matrix of Chinese characters in the standard Chinese character library to achieve the effect of Chinese character classification. The similarity measurement criterion is constructed from two angles of the structure and probability of matrix space, and the advantages and disadvantages of structural pattern recognition and statistical pattern recognition are fully considered. The experimental results show that the algorithm can achieve the first recognition accuracy rate of 99. 66% in the training samples. The first recognition accuracy of the 1 623 non-training sample text images can reach 99. 13%. The first recognition accuracy of the 5 515 non-training sample text images can reach 98. 57%. It can be proved that the proposed similarity measure method is effective in word recognition.
Keywords:probability feature   structure feature   similarity   character recognition
本文献已被 CNKI 等数据库收录!
点击此处可从《盐城工学院学报(自然科学版)》浏览原始摘要信息
点击此处可从《盐城工学院学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号