A2GSTran: Depth Map Super-resolution via Asymmetric Attention with Guidance Selection

Yifan Zuo, Yaping Xu, Yifeng Zeng, Yuming Fang*, Xiaoshui Huang, Jiebin Yan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
12 Downloads (Pure)

Abstract

Currently, Convolutional Neural Network (CNN) has dominated guided depth map super-resolution (SR). However, the inefficient receptive field growing and input-independent convolution limit the generalization of CNN. Motivated by vision transformer, this paper proposes an efficient transformer-based backbone A 2GSTran for guided depth map SR, which resolves the above intrinsic defect of CNN. In addition, state-of-the-art (SOTA) models only refine depth features with the guidance which is implicitly selected without supervision. So, there is no explicit guarantee to mitigate the artifacts of texture copying and edge blurring. Accordingly, the proposed A 2GSTran simultaneously solves two sub-problems, <italic>i.e</italic>., guided monocular depth estimation and guided depth SR, in separate branches. Specifically, the explicit supervision upon monocular depth estimation lifts the efficiency of guidance selection. The feature fusion between branches is designed via bi-directional cross attention. Moreover, since guidance domain is defined in high resolution (HR), we propose asymmetric cross attention to maintain the guidance information via pixel unshuffle instead of pooling which has unequal channel number to depth features. Based on the supervisions to depth reconstruction and guidance selection, the final depth features are refined by fusing the output features of the corresponding branches via channel attention to generate the HR depth map. Sufficient experimental results on synthetic and real datasets for multiple scales validate our contributions compared with SOTA models. The code and models are public via https://github.com/alex-cate/Depth&#x005F;Map&#x005F;Super-resolution&#x005F;via&#x005F;Asymmetric&#x005F;Attention&#x005F;with&#x005F;Guidance&#x005F;Selection

Original languageEnglish
Pages (from-to)4668-4681
Number of pages14
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume34
Issue number6
Early online date26 Oct 2023
DOIs
Publication statusPublished - 1 Jun 2024

Keywords

  • Asymmetric Cross Attention
  • Convolutional neural networks
  • Estimation
  • Feature extraction
  • Guidance Selection
  • Guided Depth Map Super-resolution
  • Image reconstruction
  • Self-attention
  • Solid modeling
  • Superresolution
  • Transformers
  • Vision Transformer

Cite this