融合物体空间关系机制的图像摘要生成方法 Object Space Relation Mechanism Fused Image Caption Method期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

融合物体空间关系机制的图像摘要生成方法

引用本文：	万璋,张玉洁,刘明童,徐金安,陈钰枫.融合物体空间关系机制的图像摘要生成方法[J].北京大学学报(自然科学版),2021,57(1):75-82.

作者姓名：	万璋张玉洁刘明童徐金安陈钰枫

作者单位：	北京交通大学计算机与信息技术学院, 北京 100044

摘要：	聚焦于图像中物体间位置关系这一特定信息,提出一种融合空间关系机制的神经网络图像摘要生成模型,以期为视觉问答和语音导航等下游任务提供物体方位或轨迹等关键信息.为了增强图像编码器的物体间位置关系学习能力,通过改进Transformer结构来引入几何注意力机制,显式地将物体间位置关系融合进物体外观信息中.为了辅助完成面向特定...
关键词：	图像摘要物体间位置关系注意力机制 Transformer结构
收稿时间：	2020-06-09
Object Space Relation Mechanism Fused Image Caption Method

WAN Zhang,ZHANG Yujie,LIU Mingtong,XU Jin'an,CHEN Yufeng.Object Space Relation Mechanism Fused Image Caption Method[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2021,57(1):75-82.

Authors:	WAN Zhang ZHANG Yujie LIU Mingtong XU Jin'an CHEN Yufeng

Institution:	School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044

Abstract:	Focusing on the specific information of the positional relationship between objects in the image, a neural network image summary generation model integrating spatial relationship mechanism is proposed, in order to provide key information (object position or trajectory) for downstream tasks such as visual question answering and voice navigation. In order to enhance the learning ability of the positional relationship between objects of the image encoder, the geometric attention mechanism is introduced by improving the Transformer structure, and the positional relationship between objects is explicitly integrated into the appearance information of the objects. In order to assist in the completion of specific information-oriented extraction and summary generation tasks, a data production method for relative position relations is further proposed, and the image abstract data set Re-Position of the position relations between objects is produced based on the SpatialSense data set. The experimental results of comparative evaluation with five typical models show that the five indicators of the proposed model are better than those of other models on the public test set COCO, and all six indicators are better than those of other models on Re-Position data set.

Keywords:	image caption positional relationship between objects attention mechanism Transformer structure
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《北京大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《北京大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏