基于分层密度特征的文档图像检索 Document image retrieval based on multi-density features期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于分层密度特征的文档图像检索

引用本文：	胡芝兰,林行刚,严洪. 基于分层密度特征的文档图像检索[J]. 清华大学学报(自然科学版), 2006, 46(7): 1231-1234

作者姓名：	胡芝兰林行刚严洪

作者单位：	清华大学,电子工程系,北京,100084;香港城市大学,计算机工程和信息技术系,香港;清华大学,电子工程系,北京,100084;香港城市大学,计算机工程和信息技术系,香港

基金项目：	国家自然科学基金;高等学校博士学科点专项科研项目

摘要：	为克服基于版面重建的文档图像检索方法对图像质量要求高,且局限于部分文种,以及基于版面分割的文档图像检索方法受限于版面分割技术等问题,提出了一种基于二值文档图像分层密度特征的检索方法。该方法通过倾斜校正、去除黑边等预处理得到有效文本区域,提取有效文本区域的长宽比和分层密度特征,通过特征比对进行检索。实验表明:该方法对不同分辨率以及不同的输入设备具有自适应能力,对复杂版面和批注等噪声鲁棒性好,漏检率为2%,是一种简单有效的文档图像检索方法。
关键词：	文档图像图像检索倾斜校正分层密度特征
文章编号：	1000-0054(2006)07-1231-04
修稿时间：	2005-05-10
Document image retrieval based on multi-density features

HU Zhilan,LIN Xinggang,YAN Hong. Document image retrieval based on multi-density features[J]. Journal of Tsinghua University(Science and Technology), 2006, 46(7): 1231-1234

Authors:	HU Zhilan LIN Xinggang YAN Hong

Abstract:	The development of document image databases is challenging document image retrieval techniques.Traditional layout reconstructed-based methods rely on high quality document images and can only deal with several widely used languages.The complexity of document layouts greatly hinter layout analysis-based approaches.This paper describes a multi-density feature-based algorithm for binary document images,which is independent of optical character recognition(OCR) or layout analyses.The text area is extracted after preprocessing including skew correction and marginal noise removal.Then the aspect ratio and multi-density features are extracted from the text area to select the best candidates from the document image database.Experimental results show that this approach is simple with loss rates less than 2% and can efficiently analyze images with different resolutions and different input systems.The system is also robust to noise due to such as notes and complex layouts.

Keywords:	document image image retrieval skew correction multi-density features
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏