首页 | 本学科首页   官方微博 | 高级检索  
     

基于聚类和双向门控循环单元 条件随机场的多类型流式文档结构识别
引用本文:王娟,李宁,姜雨彤,田英爱. 基于聚类和双向门控循环单元 条件随机场的多类型流式文档结构识别[J]. 科学技术与工程, 2021, 21(17): 7208-7216. DOI: 10.3969/j.issn.1671-1815.2021.17.033
作者姓名:王娟  李宁  姜雨彤  田英爱
作者单位:北京信息科技大学网络文化与数字文化传播重点实验室,北京100101
基金项目:流式文档排版格式的智能化分析与优化方法
摘    要:流式文档结构识别对于文档自动排版和优化、信息检索等领域有着重要作用.以往针对流式文档结构识别主要集中于学术论文领域,对于其他诸如公文、报告等多类型的文档结构识别研究较少.针对此现状,使用聚类的方法对文档进行分类,在此基础上提出了针对不同文档分类的、基于双向门控循环单元-条件随机场(bidirectional gated recurrent unit-conditional random field,BIGRU-CRF)的文档结构识别方法,以此来解决多类型文档结构识别的问题.实验结果表明,该方法不仅能够提高学术论文结构识别的效果,对其他类型的文档结构也能够进行较好地识别.

关 键 词:流式文档  结构识别  聚类  多类型文档
收稿时间:2020-10-18
修稿时间:2021-04-05

Multi-type Streaming Document Structure Recognition Based on Clustering and Bidirectional Gated Recurrent Unit-Conditional Random Field
Wang Juan,Li Ning,Jiang Yutong,Tian Yingai. Multi-type Streaming Document Structure Recognition Based on Clustering and Bidirectional Gated Recurrent Unit-Conditional Random Field[J]. Science Technology and Engineering, 2021, 21(17): 7208-7216. DOI: 10.3969/j.issn.1671-1815.2021.17.033
Authors:Wang Juan  Li Ning  Jiang Yutong  Tian Yingai
Affiliation:Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University
Abstract:Stream document structural recognition plays an important role in automatic document layout and optimization, information retrieval and other fields. In the past, it has been mainly focused on academic papers, but less research has been done on other types of documents including official documents and reports. Based on the current analysis and the clustering method to recognize documents, a document structure recognition method based on different document classification and Bidirectional Gated Recurrent Unit-Conditional Random Field (BIGRU-CRF) is proposed to solve the problem of multi-type document structure recognition. It has been shown that this method can not only improve the recognition of the structure of academic papers, but also do better for other types of document structures.
Keywords:streaming documents   structure recognition   clustering   multi-type documents
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号