首页 | 本学科首页   官方微博 | 高级检索  
     检索      

融合物体空间关系机制的图像摘要生成方法
引用本文:万璋,张玉洁,刘明童,徐金安,陈钰枫.融合物体空间关系机制的图像摘要生成方法[J].北京大学学报(自然科学版),2021,57(1):75-82.
作者姓名:万璋  张玉洁  刘明童  徐金安  陈钰枫
作者单位:北京交通大学计算机与信息技术学院, 北京 100044
摘    要:聚焦于图像中物体间位置关系这一特定信息,提出一种融合空间关系机制的神经网络图像摘要生成模型,以期为视觉问答和语音导航等下游任务提供物体方位或轨迹等关键信息.为了增强图像编码器的物体间位置关系学习能力,通过改进Transformer结构来引入几何注意力机制,显式地将物体间位置关系融合进物体外观信息中.为了辅助完成面向特定...

关 键 词:图像摘要  物体间位置关系  注意力机制  Transformer结构
收稿时间:2020-06-09

Object Space Relation Mechanism Fused Image Caption Method
WAN Zhang,ZHANG Yujie,LIU Mingtong,XU Jin'an,CHEN Yufeng.Object Space Relation Mechanism Fused Image Caption Method[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2021,57(1):75-82.
Authors:WAN Zhang  ZHANG Yujie  LIU Mingtong  XU Jin'an  CHEN Yufeng
Institution:School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044
Abstract:Focusing on the specific information of the positional relationship between objects in the image, a neural network image summary generation model integrating spatial relationship mechanism is proposed, in order to provide key information (object position or trajectory) for downstream tasks such as visual question answering and voice navigation. In order to enhance the learning ability of the positional relationship between objects of the image encoder, the geometric attention mechanism is introduced by improving the Transformer structure, and the positional relationship between objects is explicitly integrated into the appearance information of the objects. In order to assist in the completion of specific information-oriented extraction and summary generation tasks, a data production method for relative position relations is further proposed, and the image abstract data set Re-Position of the position relations between objects is produced based on the SpatialSense data set. The experimental results of comparative evaluation with five typical models show that the five indicators of the proposed model are better than those of other models on the public test set COCO, and all six indicators are better than those of other models on Re-Position data set.
Keywords:image caption  positional relationship between objects  attention mechanism  Transformer structure  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号