Recently considerable efforts have been dedicated to unconstrained face recognition, which requires to identify faces 'in the wild' for a set of images and/or video frames captured without human intervention. Unlike traditional face recognition that compares one-to-one media (either a single image or a video frame) only, we encounter a problem of matching sets with heterogeneous contents containing both images and videos. In this paper, we propose a novel set-to-set (S2S) distance measure to calculate the similarity between two sets with the aim to improve the recognition accuracy for faces with real-world challenges, such as extreme poses or severe illumination conditions. Our S2S distance adopts the kNN-average pooling for the similarity scores computed on all the media in two sets, making the identification far less susceptible to the poor representations (outliers) than traditional feature-average pooling and score-average pooling. Furthermore, we show that various metrics can be embedded into our S2S distance framework, including both predefined and learned ones. This allows to choose the appropriate metric depending on the recognition task in order to achieve the best results. To evaluate the proposed S2S distance, we conduct extensive experiments on the challenging set-based IJB-A face data set, which demonstrate that our algorithm achieves the state-of-the-art results and is clearly superior to the baselines, including several deep learning-based face recognition algorithms.
|Number of pages
|IEEE Transactions on Circuits and Systems for Video Technology
|Early online date
|31 May 2017
|Published - 1 Oct 2018