TY - JOUR
T1 - Modality independent adversarial network for generalized zero shot image classification
AU - Zhang, Haofeng
AU - Wang, Yinduo
AU - Long, Yang
AU - Yang, Longzhi
AU - Shao, Ling
N1 - Funding information: This work was supported in part by National Natural Science Foundation of China (NSFC) under Grants No. 61872187, No. 62072246 and No. 61929104, in part by the Natural Science Foundation of Jiangsu Province under Grant No. BK20201306, in part by the Medical Research Council (MRC) Innovation Fellowship (UK) under Grant No. MR/S003916/1, and in part by the “111” Program under Grant No. B13022.
PY - 2021/2/1
Y1 - 2021/2/1
N2 - Zero Shot Learning (ZSL) aims to classify images of unseen target classes by transferring knowledge from source classes through semantic embeddings. The core of ZSL research is to embed both visual representation of object instance and semantic description of object class into a joint latent space and learn cross-modal (visual and semantic) latent representations. However, the learned representations by existing efforts often fail to fully capture the underlying cross-modal semantic consistency, and some of the representations are very similar and less discriminative. To circumvent these issues, in this paper, we propose a novel deep framework, called Modality Independent Adversarial Network (MIANet) for Generalized Zero Shot Learning (GZSL), which is an end-to-end deep architecture with three submodules. First, both visual feature and semantic description are embedded into a latent hyper-spherical space, where two orthogonal constraints are employed to ensure the learned latent representations discriminative. Second, a modality adversarial submodule is employed to make the latent representations independent of modalities to make the shared representations grab more cross-modal high-level semantic information during training. Third, a cross reconstruction submodule is proposed to reconstruct latent representations into the counterparts instead of themselves to make them capture more modality irrelevant information. Comprehensive experiments on five widely used benchmark datasets are conducted on both GZSL and standard ZSL settings, and the results show the effectiveness of our proposed method.
AB - Zero Shot Learning (ZSL) aims to classify images of unseen target classes by transferring knowledge from source classes through semantic embeddings. The core of ZSL research is to embed both visual representation of object instance and semantic description of object class into a joint latent space and learn cross-modal (visual and semantic) latent representations. However, the learned representations by existing efforts often fail to fully capture the underlying cross-modal semantic consistency, and some of the representations are very similar and less discriminative. To circumvent these issues, in this paper, we propose a novel deep framework, called Modality Independent Adversarial Network (MIANet) for Generalized Zero Shot Learning (GZSL), which is an end-to-end deep architecture with three submodules. First, both visual feature and semantic description are embedded into a latent hyper-spherical space, where two orthogonal constraints are employed to ensure the learned latent representations discriminative. Second, a modality adversarial submodule is employed to make the latent representations independent of modalities to make the shared representations grab more cross-modal high-level semantic information during training. Third, a cross reconstruction submodule is proposed to reconstruct latent representations into the counterparts instead of themselves to make them capture more modality irrelevant information. Comprehensive experiments on five widely used benchmark datasets are conducted on both GZSL and standard ZSL settings, and the results show the effectiveness of our proposed method.
KW - Adversarial network
KW - Cross reconstruction
KW - Generalized Zero Shot Learning (GZSL)
KW - Modality independent learning
KW - Orthogonal constraint
UR - http://www.scopus.com/inward/record.url?scp=85097135995&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2020.11.007
DO - 10.1016/j.neunet.2020.11.007
M3 - Article
C2 - 33278759
AN - SCOPUS:85097135995
SN - 0893-6080
VL - 134
SP - 11
EP - 22
JO - Neural Networks
JF - Neural Networks
ER -