采用Transformer架构的长跨度视频压缩感知重构方法 Long-Span Video Compressive Sensing Reconstruction Based on Transformer Architecture期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

采用Transformer架构的长跨度视频压缩感知重构方法

引用本文：	夏开国,田畅,毛鹏强.采用Transformer架构的长跨度视频压缩感知重构方法[J].解放军理工大学学报,2023(3):67-72.

作者姓名：	夏开国田畅毛鹏强

作者单位：	1.陆军工程大学通信工程学院，江苏南京 210007；2.陆军工程大学指挥控制工程学院，江苏南京 210007

基金项目：	国家自然科学基金（62076251）

摘要：	深度学习的快速发展给视频压缩感知重构提供了新思路。受网络模型限制，现有的基于深度学习的压缩感知重构方法不能充分利用视频的空时特征，且对于超过16帧的视频段重构效果不够理想。采用Transformer网络构建压缩感知重构网络，利用Transformer网络在序列信号处理方面的优势构建空时注意力提取模块，学习视频帧间的空时注意力特征，更好地实现对视频连续帧的建模，从而解决长跨度视频段压缩感知重构问题。实验结果表明：所提方法在处理32张视频帧的视频分段时，能达到30 dB以上的重构精度，在处理96张视频帧的视频分段时，仍能达到27 dB以上的良好性能。
关键词：	视频压缩感知深度学习 Transformer 长跨度
收稿时间：	2023/3/7 0:00:00
Long-Span Video Compressive Sensing Reconstruction Based on Transformer Architecture

XIA Kaiguo,TIAN Chang,MAO Pengqiang.Long-Span Video Compressive Sensing Reconstruction Based on Transformer Architecture[J].Journal of PLA University of Science and Technology(Natural Science Edition),2023(3):67-72.

Authors:	XIA Kaiguo TIAN Chang MAO Pengqiang

Abstract:	The rapid development of deep learning has provided new ideas for video compressive sensing reconstruction. Due to the limitation of the network model, the existing compressive sensing reconstruction methods based on deep learning cannot fully utilize the spatial-temporal feature of video, and the reconstruction effect is poor for video segments of more than 16 frames. In order to solve this problem, the Transformer network is proposed in this paper to build a compressive sensing reconstruction network, on which the advantage of the Transformer network in sequence signal processing is beneficial to building a spatial-temporal attention extraction module, which can learn the spatial-temporal attention features between video frames. On this basis, the modeling of continuous video frames can be realized and the problem of compressive sensing reconstruction of long-span video segments can be solved.The experimental results show that the proposed method can achieve more than 30 dB reconstruction accuracy for video segments of 32 video frames and still achieve more than 27 dB performance when processing video segments of up to 96 video frames.

Keywords:

	点击此处可从《解放军理工大学学报》浏览原始摘要信息
	点击此处可从《解放军理工大学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏