首页 | 本学科首页   官方微博 | 高级检索  
     

基于分层密度特征的文档图像检索
引用本文:胡芝兰,林行刚,严洪. 基于分层密度特征的文档图像检索[J]. 清华大学学报(自然科学版), 2006, 46(7): 1231-1234
作者姓名:胡芝兰  林行刚  严洪
作者单位:清华大学,电子工程系,北京,100084;香港城市大学,计算机工程和信息技术系,香港;清华大学,电子工程系,北京,100084;香港城市大学,计算机工程和信息技术系,香港
基金项目:国家自然科学基金;高等学校博士学科点专项科研项目
摘    要:为克服基于版面重建的文档图像检索方法对图像质量要求高,且局限于部分文种,以及基于版面分割的文档图像检索方法受限于版面分割技术等问题,提出了一种基于二值文档图像分层密度特征的检索方法。该方法通过倾斜校正、去除黑边等预处理得到有效文本区域,提取有效文本区域的长宽比和分层密度特征,通过特征比对进行检索。实验表明:该方法对不同分辨率以及不同的输入设备具有自适应能力,对复杂版面和批注等噪声鲁棒性好,漏检率为2%,是一种简单有效的文档图像检索方法。

关 键 词:文档图像  图像检索  倾斜校正  分层密度特征
文章编号:1000-0054(2006)07-1231-04
修稿时间:2005-05-10

Document image retrieval based on multi-density features
HU Zhilan,LIN Xinggang,YAN Hong. Document image retrieval based on multi-density features[J]. Journal of Tsinghua University(Science and Technology), 2006, 46(7): 1231-1234
Authors:HU Zhilan  LIN Xinggang  YAN Hong
Abstract:The development of document image databases is challenging document image retrieval techniques.Traditional layout reconstructed-based methods rely on high quality document images and can only deal with several widely used languages.The complexity of document layouts greatly hinter layout analysis-based approaches.This paper describes a multi-density feature-based algorithm for binary document images,which is independent of optical character recognition(OCR) or layout analyses.The text area is extracted after preprocessing including skew correction and marginal noise removal.Then the aspect ratio and multi-density features are extracted from the text area to select the best candidates from the document image database.Experimental results show that this approach is simple with loss rates less than 2% and can efficiently analyze images with different resolutions and different input systems.The system is also robust to noise due to such as notes and complex layouts.
Keywords:document image  image retrieval  skew correction  multi-density features
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号