A2GSTran: Depth Map Super-resolution via Asymmetric Attention with Guidance Selection

Yifan Zuo, Yaping Xu, Yifeng Zeng, Yuming Fang, Xiaoshui Huang, Jiebin Yan

Research output: Contribution to journalArticlepeer-review


Currently, Convolutional Neural Network (CNN) has dominated guided depth map super-resolution (SR). However, the inefficient receptive field growing and input-independent convolution limit the generalization of CNN. Motivated by vision transformer, this paper proposes an efficient transformer-based backbone A 2 GSTran for guided depth map SR, which resolves the above intrinsic defect of CNN. In addition, state-of-the-art (SOTA) models only refine depth features with the guidance which is implicitly selected without supervision. So, there is no explicit guarantee to mitigate the artifacts of texture copying and edge blurring. Accordingly, the proposed A 2 GSTran simultaneously solves two sub-problems, i.e ., guided monocular depth estimation and guided depth SR, in separate branches. Specifically, the explicit supervision upon monocular depth estimation lifts the efficiency of guidance selection. The feature fusion between branches is designed via bi-directional cross attention. Moreover, since guidance domain is defined in high resolution (HR), we propose asymmetric cross attention to maintain the guidance information via pixel unshuffle instead of pooling which has unequal channel number to depth features. Based on the supervisions to depth reconstruction and guidance selection, the final depth features are refined by fusing the output features of the corresponding branches via channel attention to generate the HR depth map. Sufficient experimental results on synthetic and real datasets for multiple scales validate our contributions compared with SOTA models. The code and models are public via https://github.com/alex-cate/Depth_Map_Super-resolution_via_Asymmetric_Attention_with_Guidance_Selection
Original languageEnglish
Pages (from-to)1-15
Number of pages15
JournalIEEE Transactions on Circuits and Systems for Video Technology
Early online date26 Oct 2023
Publication statusE-pub ahead of print - 26 Oct 2023

Cite this