首页 | 本学科首页   官方微博 | 高级检索  
     

多模态公文的结构知识抽取与组织研究
引用本文:徐瑞麟,耿伯英,刘树衎. 多模态公文的结构知识抽取与组织研究[J]. 系统工程与电子技术, 2022, 44(7): 2241-2250. DOI: 10.12305/j.issn.1001-506X.2022.07.20
作者姓名:徐瑞麟  耿伯英  刘树衎
作者单位:1. 海军工程大学电子工程学院, 湖北 武汉 4300332. 中国人民解放军91001部队, 北京 1000363. 东南大学计算机科学与工程学院, 江苏 南京 211189
基金项目:湖北省自然科学基金(2018CFC800)
摘    要:针对目前基于三元组知识构建的知识图谱结构逻辑性弱、难以形成知识体系的问题, 以公文应用背景为牵引, 提出多模态知识结构要素抽取模型, 构建多模态公文文档数据集GovDoc-CN, 在文本和图像两个模态对文档中包括各级标题、摘要、作者、成文时间、文档编号等在内的知识结构要素进行抽取。设计文档结构树模型对抽取的文档知识结构要素进行组织, 并构建结构化图网络实现文档的组织和管理。实验证明, 相较于单一模态的抽取模型, 多模态知识结构要素抽取模型取得了明显的效果提升, 文档结构树模型和基于文档结构树模型构建的结构化图网络为文档知识的组织与管理提供了一种新途径, 具有重要的应用价值。

关 键 词:多模态  信息抽取  知识组织  文档结构化  公文自动化  
收稿时间:2020-12-16

Research on structural knowledge extraction and organization for multi-modal governmental documents
Ruilin XU,Boying GENG,Shukan LIU. Research on structural knowledge extraction and organization for multi-modal governmental documents[J]. System Engineering and Electronics, 2022, 44(7): 2241-2250. DOI: 10.12305/j.issn.1001-506X.2022.07.20
Authors:Ruilin XU  Boying GENG  Shukan LIU
Affiliation:1. School of Electronic Engineering, Naval University of Engineering, Wuhan 430033, China2. Unit 91001 of the PLA, Beijing 100036, China3. School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
Abstract:For the fact that triplet-based knowledge in large-scale knowledge graphs lacks structural logic and is difficult to form a knowledge system, this paper presents a multi-modal governmental documents dataset called GovDoc-CN. A multi-modal knowledge structure elements extraction model is proposed and knowledge structure elements are extracted, including titles, abstracts, authors, time of completion, document number, and other knowledge structure elements in documents through both text modal and image modal. The document structure tree (DST) model is designed to organize the extracted document knowledge structure elements, and a structured graph network is constructed to realize organization and management. Experiments prove that the multi-modal knowledge structural elements extraction model has achieved a significant improvement compared with the single-modal extraction models. The DST model and the structured graph network based on the DST model can provide a new way for the organization and management of document knowledge and have significant application value.
Keywords:multi-modal  information extraction  knowledge organization  document structuring  governmental documents automation  
点击此处可从《系统工程与电子技术》浏览原始摘要信息
点击此处可从《系统工程与电子技术》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号